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Préfacé 


This book is the resuit of a sériés of lectures on linear algebra and the geometry of 
multidimensional spaces given in the 1950s through 1970s by Igor R. Shafarevich 
at the Faculty of Mechanics and Mathematics of Moscow State University. 

Notes for some of these lectures were preserved in the faculty library, and these 
were used in preparing this book. We hâve also included some topics that were 
discussed in student seminars at the time. Ail the material included in this book is 
the resuit of joint work of both authors. 

We employ in this book some results on the algebra of polynomials that are 
usually taught in a standard course in algebra (most of which are to be found in 
Chaps. 2 through 5 of this book). We hâve used only a few such results, without 
proof: the possibility of dividing one polynomial by another with remainder; the 
theorem that a polynomial with complex coefficients has a complex root; that every 
polynomial with real coefficients can be factored into a product of irreducible first- 
and second-degree factors; and the theorem that the number of roots of a polynomial 
that is not identically zéro is at most the degree of the polynomial. 

To provide a visual basis for this course, it was preceded by an introductory 
course in analytic geometry, to which we shall occasionally refer. In addition, some 
topics and examples are included in this book that are not really part of a course in 
linear algebra and geometry but are provided for illustration of various topics. Such 
items are marked with an asterisk and may be omitted if desired. 

For the convenience of the reader, we présent here the System of notation used 
in this book. For vector spaces we use sans serif letters: L, M, N, . . . ; for vectors, 
we use boldface italics: x, y, z, • . . ; for linear transformations, we use calligraphie 
letters: A, <£, C, . . . ; and for the corresponding matrices, we use uppercase italic 
letters: A, B, C, 
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Préliminaires 


In this book we shall use a number of concepts from set theory. These ideas appear 
in most mathematics courses, and so they will be familiar to some readers. However, 
we shall recall them here for convenience. 


Sets and Mappings 

A set is a collection of arbitrarily chosen objects defined by certain precisely speci- 
fied properties (for example, the set of ail real numbers, the set of ail positive num- 
bers, the set of solutions of a given équation, the set of points that form a given 
géométrie figure, the set of wolves or trees in a given forest). If a set consists of 
a finite number of éléments, then it is said to b e finite, and if not, it is said to be 
infinité. We shall employ standard notation for certain important sets, denoting the 
set of natural numbers by N, the set of integers by Z, the set of rational numbers by 
Q, the set of real numbers by R, and the set of complex numbers by C. The set of 
natural numbers not exceeding a given natural number n , that is, the set consisting 
of 1,2 , ,n, will be denoted by N n . The objects that make up a set are called its 
éléments or sometimes points. If x is an element of the set M, then we shall write 
x g M. If we need to specify that x in not an element of M, then we shall write 
x £M. 

A set S consisting of certain éléments of the set M (that is, every element of the 
set S is also an element of the set M) is called a subset of M . We write S C M. 
For example, N„cN for arbitrary n , and likewise, we hâve N C Z, Z C Q, QcM, 
and M c C. A subset of M consisting of éléments x a e M (where the index a runs 
over a given finite or infinité set) will be denoted by {x^}. It is convenient to include 
among the subsets of a set M the set that contains no éléments at ail. We call this 
set the empty set and dénoté it by 0. 

Let M and N be two arbitrary sets. The collection of ail éléments that belong si- 
multaneously to both M and N is called the intersection of M and N and is denoted 
by M fl A. If we hâve M fl N — 0, then we say that the sets M and N are disjoint. 
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The collection of éléments belonging to either M or N (or to both) is called the 
union of M and N and is denoted by M U N . Finally, the set of éléments that belong 
to M but do not belong to N is called the complément of N in M and is denoted by 
M\N. 

We say that a set M has an équivalence relation defined on it if for every pair of 
éléments x and y of M, either the éléments x and y are équivalent (in which case 
we write x ~ y) or they are inequivalent (x f y), and if in addition, the following 
conditions are satisfied: 

1. Every element of M is équivalent to itself: x ~ x (reflexivity). 

2. If x ~ y, then y ~ x (symmetry). 

3. If x ~ y and y ~ z, then x ~ z (transitivity). 

If an équivalence relation is defined on a set M, then M can be represented as the 
union of a (finite or infinité) collection of sets M a called équivalence classes with 
the following properties: 

(a) Every element x e M is contained in one and only one équivalence class M a . 
In other words, the sets M a are disjoint, and their union (finite or infinité) is the 
entire set M . 

(b) Eléments x and y are équivalent (x ~ y) if and only if they belong to the same 
subset M a . 

Clearly, the converse holds as well: if we are given a représentation of a set M 
as the union of subsets M a satisfying property (a), then setting x ~ y if (and only 
if) these éléments belong to the same subset M a , we obtain an équivalence relation 
on M. 

From the above reasoning, it is clear that the équivalence thus defined is com- 
pletely abstract; there is no indication as to precisely how it is decided whether two 
éléments x and y are équivalent. It is necessary only that conditions 1 through 3 
above be satisfied. Therefore, on a particular set M one can define a wide variety of 
équivalence relations. 

Let us consider a few examples. Let the set M be the natural numbers, that is, 
M = N. Then on this set it is possible to define an équivalence relation defined by 
the condition that x ~ y if x and y hâve the same remainder on division by a given 
natural number n. It is clear that conditions 1 through 3 above are satisfied, and 
N can be represented as the union of n classes (in the case n — 1, ail the natural 
numbers are équivalent to each other and so there is only one class; if n — 2, there 
are two classes, namely the even numbers and the odd numbers; and so on). Now let 
M be the set of points in the plane or in space. We can define an équivalence relation 
by the rule that x ~ y if the points x and y are the same distance from a given fixed 
point O. Then the équivalence classes are ail circles (in the case of the plane) or 
spheres (in space) with center at O. If, on the other hand, we wanted to consider 
two points équivalent if the distance between them is some given number, then we 
would not hâve an équivalence relation, since transitivity would not be satisfied. 

In this book, we shall encounter several types of équivalence relations (for exam- 
ple, on the set of square matrices). 
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A mapping from a set M into a set A is a rule that assigns to every element 
of the set M a particular element of A. For example, if M is the set of ail bears 
currently alive on Earth and A is the set of positive numbers, then assigning to each 
bear its weight (for example in kilograms) consti tûtes a mapping from M to N. We 
shall call such mappings of a set M into A functions on M with values in A. We 

shall usually dénoté such an assignment by one of the letters /, g, ... or F, G, 

Mappings from a set M into a set A are indicated with an arrow and are written thus: 
/ : M — > A. An element y g A assigned to an element x g M is called the value of 
the function / at the point x. This is written using an arrow with a tail, f :x y, 
or the equality y = /(x). Later on, we shall frequently display mappings between 
sets in the form of a diagram : 

M — > N. 

If the sets M and A coincide, then / : M — >• M is called a mapping of M into 
itself. A mapping of a set into itself that assigns to each element x that same element 
x is called an identity mapping. It will be denoted by the letter e, or if it is important 
to specify the underlying set M, by eM- Thus in our notation, we hâve eM • M —> M 
and eM (x) = x for every x e M. 

A mapping / : M — > N is called an injection or an injective mapping if different 
éléments of the set M are assigned different éléments of the set N, that is, it is 
injective if f(x\) = /(x 2) always implies x\ =X2- 

If S is a subset of N and f : M —> N is a mapping, then the collection of ail 
éléments x g M such that /(x) G S is called the p reimage or inverse image of S 
and is denoted by In particular, if S consists of a single element y e N, 

then is called the preimage or inverse image of the element y and is writ- 

ten / _1 (y). Using this terminology, we may say that a mapping / : M — N is 
an injection if and only if for every element y g A, its inverse image f~ l (y) con- 
sists of at most a single element. The words “at most” imply that certain éléments 
y g A may hâve an empty preimage. For example, let M = A = M and suppose 
the mapping / assigns to each real number x the value /(x) = arctanx. Then / is 
injective, since the inverse image f~ [ (y) consists of a single element if |y | < j and 
is the empty set if | y | > y. 

If S is a subset of M and / : M — > A is a mapping, then the collection of ail 
éléments y G A such that y = /(x) for some x G S is called the image of the subset 
S and is denoted by f(S). In particular, the subset S could be the entire set M, in 
which case f(M) is called the image of the mapping /. We note that the image of 
/ does not hâve to consist of the entire set A. For example, if M = A = M and 
/ is the squaring operation (raising to the second power), then f(M) is the set of 
nonnegative real numbers and does not coincide with the set M. 

If again S is a subset of M and / : M — > A a mapping, then applying the map- 
ping only to éléments of the set S defines a mapping / : S A, called the restric- 
tion of the mapping / to S. In other words, the restriction mapping is defined by 
taking /(x) for each x g S as before and simply ignoring ail x £ S. Conversely, if 
we start off with a mapping / : S —> A defined only on the subset S, and then some- 
how define /(x) for the remaining éléments xgM\5, then we obtain a mapping 
/ : M — > A, called an extension of / to M. 
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A mapping / : M —> A is bijective or a bijection if it is injective and the image 
f(M) is the entire set A, that is, f(M) = A . Equivalently, a mapping is a bijection 
if for each element y G A, there exists precisely one element x e M such that y = 
f(x)} In this case, it is possible to define a mapping from A into M that assigns to 
each element y e N the unique element x e M such that f(x) = y. Such a mapping 
is called the inverse of / and is denoted by / -1 : A —> M. Now suppose we are 
given sets M, A, L and mappings / : M —> A and g : A — ► L, which we display in 
the following diagram: 

M — N — ► L. (1 ) 

Then application of / followed by g defines a mapping from M to L by the obvious 
rule: first apply the mapping / : M A, which assigns to each element x e M an 
element y e N, and then apply the mapping g : A -> L that takes an element y to 
some element zgL. We thus obtain a mapping from M to L called the composition 
of the mappings / and g, written go/ or simply g/. Using this notation, the 
composition mapping is defined by the formula 

(8 o /)(*) = *(/(*)) (2) 

for an arbitrary x e M. We note that in équation (2), the letters / and g that dénoté 
the two mappings appear in the reverse order to that in the diagram (1). As we shall 
see later, such an arrangement has a number of advantages. 

As an example of the composition of mappings we offer the obvious equalities 

eN°f = f, f o e M — f, 

valid for any mapping / : M A, and likewise the equalities 

fof~ l =e N , f~ l o f — e M , 

which are valid for any bijective mapping / : M A. 

The composition of mappings has an important property. Suppose that in addition 
to the mapping shown in diagram (1), we hâve as well a mapping h : L — ► A, where 
K is an arbitrary set. Then we hâve 

h o (g o /) = (h o g) o f. (3) 

The truth of this claim follows at once from the définitions. First of ail, it is apparent 
that both sides of équation (3) contain a mapping from M to K. Thus we need to 
show that when applied to any element x e M, both sides give the same element of 
the set K. According to définition (2), for the left-hand side of (3), we obtain 

ho (go f)(x) = h((g o /)(*)), (g o /)(*) = g(/0))- 


1 Translater’ s note: The term one-to-one is also used in this context. However, its use can be con- 
fusing: an injection is sometimes called a one-to-one mapping , while a bijection is sometimes 
called a one-to-one correspondence. In this book, we shall strive to stick to the terms injective and 
bijective. 
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Substituting the second équation into the first, we finally obtain ho(gof)(x ) = 
h(g( f(x))). Analogous reasoning shows that we obtain precisely the same expres- 
sion for the right-hand side of équation ( 3 ). 

The property expressed by formula ( 3 ) is called associativity. Associativity plays 
an important rôle, both in this course and in other branches of mathematics. There - 
fore, we shall pause here to consider this concept in more detail. For the sake of 
generality, we shall consider a set M of arbitrary objects (they can be numbers, 
matrices, mappings, and so on) on which is defined the operation of multiplication 
associating two éléments a e M and b e M with some element a b e M, which we 
call the product , such that it possesses the associative property: 

(< ab)c — a(bc ). ( 4 ) 

The point of condition ( 4 ) is that without it, we can calculate the product of élé- 
ments a \ , . . . , a m for m > 2 only if the sequence of multiplications is indicated by 
parenthèses, indicating which pairs of adjacent éléments we are allowed to multiply. 
For example, with m — 3 , we hâve two possible arrangements of the parenthèses: 
(tf Itf 2 )tf 3 and a\ (^2^3). For m= 4 we hâve five variants: 

((«102)03)04, («l(« 203 )) 04 , (0102X0304), 

ai ((0203)04), 01(02(0304)), 

and so on. It turns out that if for three factors (m = 3 ), the product does not dépend 
on how the parenthèses are ordered (that is, the associative property is satisfied), 
then it will be independent of the arrangement of parenthèses with any number of 
factors. 

This assertion is easily proved by induction on m. Indeed, let us suppose that 
it is true for ail products of m or fewer éléments, and let us consider products 
of m 4- 1 éléments a \, . . . , a m , a m +\ for ail possible arrangements of parenthè- 
ses. It is easily seen that in this case, there are two possible alternatives: ei- 
ther there is no parenthesis between éléments a m and a m + 1, or else there is one. 
Since by the induction hypothesis, the assertion is correct for a 1, ... , a m , then in 
the first case we obtain the product (a \ • • • a m -\)(a m a m +\), while in the second 
case, we hâve (a\ • - • a m )a m +\ — ((< a\ • • • a m -\)a m )a m +\. Introducing the notation 
a = a\ • • b = a m , and c = a, „ + i, we obtain the products a(bc) and ( ab)c , 

the equality of which follows from property ( 4 ). 

In the spécial case a\ = • • • = a m — a , the product a\ • • • a m is denoted by a m and 
is called the rath power of the element a. 

There is another important concept connected to the composition of mappings. 

Let R be a given set. We shall dénoté by ^(M, R) the collection of ail map- 
pings M R, and analogously, by $(N, R) the collection of ail mappings N — >• R. 
Then with every mapping / : M N is associated the particular mapping /* : 
SX A, R) S(Af, R), called the dual to / and defined as follows: For every map- 
ping (p g $(N, R) it assigns the mapping f*((p) € S(Af, R) according to the formula 

f*(<P) = <Pof. 


( 5 ) 
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Formula (5) indicates that for an arbitrary element x e M, we hâve the equality 
f*((p)(x) — (p o /(x), which can also be expressed by the following diagram: 

M 



Here we become acquainted with the following general mathematical fact: Func- 
tions are written in reverse orcler in comparison with the order ofthe sets on which 
they are defined. This phenomenon will appear in our book, as well as in other 
courses in relationship to more complex objects (such as differential forms). 

The dual mapping /* possesses the following important property: If we hâve 
mappings of sets, as depicted in diagram ( 1 ), then 

te o /)* = /* o g*. ( 6 ) 

Indeed, we obtain the dual mappings 

S(L,R) — S(N,R) — ^ $(M,R). 

By définition, for g o f : M —> L, the dual mapping (g o /)* is a mapping from 
g o $(L, R) into R). As can be seen from (2), /* o g* is also a mapping of the 
same sets. It remains for us to show that (g o /)* and /* o g* take every element 

e 3XL, R) to one and the same element of the set $(M, R). By (5), we hâve 

te°/)*(V0 = ^ °te°/)* 

Analogously, taking into account (2), we obtain the relationship 

f* ° g* (if) = f*{g*W) = f*(ÿ o g) = (if o g) o /• 

Thus for a proof of equality ( 6 ), it suffices to verify associativity: f o (g o /) = 
(fog)of. 

Up to now, we hâve considered mappings (functions) of a single argument. The 
définition of functions of several arguments is reduced to this notion with the help 
of the operation of product of sets. 

Let Mi, ... , M n be arbitrary sets. Consider the ordered collection (x \, . . . , x n ), 
where x/ is an arbitrary element of the set M, . The word “ordered” indicates that 
in such collections, the order of the sequence of éléments x, is taken into account. 
For example, in the case n — 2 and Mi = M 2 , the pairs (xi,X 2 ) and (x 2 ,xi) are 
considered to be different if xi 7 ^ X 2 . A set consisting of ail ordered collections 
(xi , . . . , x n ) is called the product of the sets Mi , . . . , M n and is denoted by M\ x 
• • • x M n . 

In the spécial case M\ = • • • = M n — M, the product M\ x • • • x M n is denoted 
by M n and is called the nth power of the set M. 

Now we can define a function of an arbitrary number of arguments, each of which 
assumes values from “its own” set. Let Mi , . . . , M„ be arbitrary sets, and let us 
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define M — M\ x • • • x M n . B y définition, the mapping / : M N assigns to 
each element x e M a certain element y g A, that is, it assigns to n éléments x\ g 
M i, ... ,x n g M n , taken in the assigned order, the element y = f(x\ , . . . , x n ) of the 
set N. This is a function of n arguments xt, each of which takes values from “its 
own” set Mi . 


Some Topological Notions 

Up to now, we hâve been speaking about sets of arbitrary form, not assuming that 
they possess any additional properties. Generally, that will not suffice. For example, 
let us assume that we wish to compare two géométrie figures, in particular, to déter- 
mine the extent to which they are or are not “alike.” Let us consider the two figures 
to be sets whose éléments are points in a plane or in space. If we wish to limit our- 
selves to the concepts introduced above, then it is natural to consider “alike” those 
sets between which there exists a bijection. However, toward the end of the nine- 
teenth century, Georg Cantor demonstrated that there exists a bijection between the 
points of a line segment and those of the interior of a square. 2 At the same time, 
Richard Dedekind conjectured that our intuitive idea of “alikeness” of figures is 
connected with the possibility of establishing between them a continuons bijection. 
But for that, it is necessary to define what it means for a mapping to be continuous. 

The branch of mathematics in which one studies continuous mappings of abstract 
sets and considers objects with a précision only up to bijective continuous mappings 
is called topology. Using the words of Hermann Weyl, we may say that in this book, 
“the mountain range of topology will loom on the horizon.” More precisely, we 
shall introduce some topological notions only now and then, and then only the sim- 
plest ones. We shall formulate them now, but we shall appeal to them seldom, and 
only to indicate a connection between the objects that we are considering with other 
branches of mathematics to which the reader may be introduced in more detail in 
other courses or textbooks. Such instances can be read or passed over as desired; 
they will not be used in the remainder of the book. To define a continuous mapping 
/ : M N it is necessary first to define the notion of convergence on the sets M 
and N. In some cases, we will define convergence on sets (for example, in spaces 
of vectors, spaces of matrices, or projective spaces), based on the notion of conver- 
gence in M and C, which is assumed to be familiar to the reader from a course in 
calculus. In other cases, we shall make use of the notion of metric. 

A set M is called a metric space if there exists a function r : M 2 —> M assign- 
ing to every pair of points x, y e M a number r(x, y) that satisfies the following 
conditions: 

1. r(x, y) > 0 for x ^ y, and r{x,x) = 0, for every x, y e M. 


2 This resuit so surprised him, that as Cantor wrote in a letter, he believed for a long time that it was 
incorrect. 
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2. r(x , y) = r(y, x) for every x, y G M. 

3. For any three points x, y, z g M one has the inequality 

r(x,z)<r(x,y) + r(y,z). (7) 

Such a function r(x, y) is called a metric or distance on M, and the properties 
enumerated in its définition constitute an axiomatization of the usual properties of 
distance known from courses in elementary or analytic geometry. 

For example, the set M of ail real numbers (and also any subset of it) becomes 
a metric space if for every pair of numbers x and y we introduce the function 
r(x, y) — \x- y | or r(x, y) = VI* - v|. 

For an arbitrary metric space there is automatically defined the notion of conver- 
gence of points in the space: a sequence of points x& converges to the point x as 
k —> oo (notation: x* x) if r(x^, x) 0 as k oo. The point x in this case is 
called the limit of the sequence x&. 

Let X c M be some subset of M, and M a metric space with the metric r(x, y), 
that is, a mapping r : M 2 -> M satisfying the three properties given above. It is clear 
that the restriction of r(x, y) to the subset X 2 C M 2 also satisfies those properties, 
and hence it defines a metric on X. We say that X is a metric space with the metric 
induced by the metric of the enclosing space M or that X C M is a metric subspace. 

The subset X is said to be closed in M if it contains the limit point of every 
convergent sequence in X , and it is said to be bouncled if there exist a point x e X 
and a number c > 0 such that r(x, y) < c for ail y G X. 

Let M and N be sets on each of which is defined the notion of convergence (for 
example, M and N could be metric spaces). A mapping f : M —> N is said to be 
continuons at the point x e M if for every convergent sequence Xk x of points 
in the set M , one has /(x*) /(x). If the mapping f : M N is continuous at 

every point x e M, then we say that it is continuons on the set M or simply that it is 
continuous. 

The mapping f : M N is called a homeomorphism if it is injective with an 
injective inverse mapping / : N —> M, both of which are continuous v The sets 

M and N are said to be homeomorphic or topologically équivalent if there exists 
a homeomorphism / : M — > N. It is easily seen that the property among sets of 
being homeomorphic (for a given fixed définition of convergence) is an équivalence 
relation. 

Given two infinité sets M and N on which no metrics hâve initially been defined, 
if we then supply them with metrics using first one définition and then another, we 
will obtain differing notions of homeomorphism f : M N, and it can turn out 
that in one type of metric, M and N are homeomorphic, while in another type they 
are not. For example, on arbitrary sets M and N let us define what is called the 
discrète metric, defined by the relations r(x, y) = 1 for ail x^y and r(x, x) = 0 
for ail x. It is clear that with such a définition, ail the properties of a metric are 


3 We wish to emphasize that this last condition is essential: from the continuity of / one may not 
conclude the continuity of f~ l . 
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Fig. 1 Homeomorphic and nonhomeomorphic curves (the symbol ~ means that the figures are 
homeomorphic, while 7^ means that they are not) 


satisfied, but the notion of homeomorphism / : M N becomes empty: it simply 
coincides with the notion of bijection. For indeed, in the discrète metric, a sequence 
Xk converges to x if beginning with some index k, ail the points Xk are equal to x. 
As follows from the définition of continuous mapping given above, this means that 
every mapping / : M — ► N is continuous. 

For example, according to a theorem of Cantor, a line segment and a square are 
homeomorphic under the discrète metric, but if we consider them, for example, as 
metric spaces in the plane on which distance is defined as in a course in elementary 
geometry (let us say using the System of Cartesian coordinates), then the two sets 
are no longer homeomorphic. 

This shows that the discrète metric fails to reflect some important properties of 
distance with which we are familiar from courses in geometry, one of which is that 
for an arbitrarily small number s > 0, there exist two distinct points x and y for 
which r(x, y) < s. Therefore, if we are to formulate our intuitive idea of “géomét- 
rie similarity” of two sets M and N , it is necessary to consider them not with an 
arbitrary metric, but with a metric that reflects these géométrie notions. 

We are not going to go more deeply into this question, since for our purposes that 
is unnecessary. In this book, when we “compare” sets M and N , where at least one 
of them (say N) is a géométrie figure in the plane (or in space), then distance will be 
determined in the usual way, with the metric on N induced by the metric in the plane 
(or in the space) in which it lies. It remains for us to define the metric (or notion of 
convergence) on the set M in such a way that M and N are homeomorphic. That is 
how we shall make précisé the idea of comparison. 

If the figures M and N are metric subspaces of the plane or space with distance 
defined as in elementary geometry, then there exists for them a very graphie inter- 
prétation of the concept of topological équivalence. Imagine that figures M and N 
are made out of rubber. Then their being homeomorphic means that we can deform 
M into N without tearing and without gluing together any points. This last condi- 
tion (“without tearing and without gluing together any points”) is what makes the 
notion of homeomorphism much stronger than simply a bijective mapping of sets. 

For example, an arbitrary continuous closed curve without self-intersection (for 
example, a triangle or square) is homeomorphic to a circle. On the other hand, a con- 
tinuous closed curve with self-intersection (say a figure eight) is not homeomorphic 
to a circle (see Fig. 1). 

In Fig. 2 we hâve likewise depicted examples of homeomorphic and nonhomeo- 
morphic figures, this time in three-dimensional space. 

We conclude by introducing a few additional simple topological concepts that 
will be used in this book. 


XX 
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handle (weight) (doughnut) two handles 
Fig. 2 Homeomorphic and nonhomeomorphic surfaces 


A path in a metric space M is a continuons mapping / : I — > M, where I is 
an interval of the real line. Without any loss of generality, we may assume that 
I — [0, 1]. In this case, the points /(O) and /( 1) are called the beginning and end 
of the path. Two points x, y G M are said to be continuously déformable into each 
other if there is a path in which x is the beginning and y is the end. Such a path 
is called a deformation of x into y, and we shall notate the fact that x and y are 
déformable into one another by x ~ y. 

The property for éléments of a space M to be continuously déformable into one 
another is an équivalence relation on M, since properties 1 through 3 that define such 
a relation are satisfied. Indeed, the reflexive property is obvious. To prove symmetry, 
it suffices to observe that if f(t) is a deformation of x into y, then /(I — t) is a 
deformation of y into x. Now let us verify transitivity. Let x ~ y and y ~ z, f(t) 
a deformation of x into y, and g(t) a deformation of y into z. Then the mapping 
h : I —>■ M determined by the equality h(t) = f(2t) for t g [0, and the equality 

h(t) — g(2t — 1) for t g [^, 1] is continuous, and for this mapping, the equalities 
h( 0) = /(O) = x, h{ 1) = g(l) = z are satisfied. Thus h(t) gives the continuous 
deformation of the point x to z, and therefore we hâve x ~ z. 

If every pair of éléments of a metric space M can be deformed one into the other 
(that is, the relationship ~ defines a single équivalence class), then the space M is 
said to be path-connected. If that is not the case, then for each element x e M we 
consider the équivalence class M x consisting of ail éléments y g M such that x ~ y. 
By the définition of équivalence class, the metric space M x will be path-connected. 
It is called the path-connected component of the space M containing the point x. 
Thus the équivalence relation defined by a continuous deformation décomposés M 
into path-connected components. 

In a number of important cases, the number of components is finite, and we 
obtain the représentation M — M\ U • • • U M&, where M/ fl Mj — 0 for i j and 
each Mi is path-connected. It is easily seen that such a représentation is unique. The 
sets Mi are called the path-connected components of the space M. 

For example, a hyperboloid of one sheet, a sphere, and a cône are each path- 
connected, but a hyperboloid of two sheets is not: it has two path-connected com- 
ponents. The set of real numbers defined by the condition 0 < |x| < 1 has two 
path-connected components (one containing positive numbers; the other, négative 
numbers), while the set of complex numbers defined by the same condition is path- 
connected. The properties preserved by homeomorphisms are called topological 
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properties. Thus, for example, the property of path-connectedness is topological, 
as is the number of path-connected components. 

Let M and N be metric spaces (let us dénoté their respective metrics by r and r'). 
A mapping / : M N is called an isometry if it is bijective and préserves distances 
between points, that is, 

r(xux 2 ) = r'(f(x i), f(x 2 )) (8) 

for every pair of points x\,X 2 € M. From the relationship (8), it follows automati- 
cally that an isometry is an embedding. Indeed, if there existed points x\ ^ X 2 in the 
set M for which the équation f(x\) = f(x 2 ) were satisfied, then from condition 1 
in the définition of a metric space, the left-hand side of (8) would be different from 
zéro, while the right-hand side would be equal to zéro. Therefore, the requirement 
of a bijective mapping is here reduced to the condition that the image of /(M) 
coincide with ail of the set N. 

Metric spaces M and N are called isométrie or metrically équivalent if there ex- 
ists an isometry / : M N. It is easy to see that an isometry is a homeomorphism 
and generalizes the notion of the motion of a rigid body in space, whereby we can- 
not arbitrarily deform the sets M and N into one another as if they were made of 
rubber (without tearing and gluing). We can only treat them as if they were rigid 
or made of flexible, but not compressible or stretchable, materials (for example, an 
isometry of a piece of paper is obtained by bending it or rolling it up). 

In the plane or in space with distance determined by the familial' methods of el- 
ementary geometry, examples of isométries are parallel translations, rotations, and 
symmetry transformations. Thus, for example, two triangles in the plane are iso- 
métrie if and only if they are “equal” (that is, congruent in the sense defined in 
courses in school geometry, namely equality of sides and angles), and two ellipses 
are isométrie if and only if they hâve equal major and minor axes. 

In conclusion, we observe that in the définition of homeomorphism, path- 
connectedness, and path-connected component, the notion of metric played only 
an auxiliary rôle. We used it to define the notion of convergence of a sequence of 
points, so that we could speak of continuity of a mapping and thereby introduce 
concepts that dépend on this notion. It is convergence that is the basic topological 
notion. It can be defined by various metrics, and it can also be defined in another 
way, as is usually done in topology. 
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Linear Equations 


1.1 Linear Equations and Functions 

In this chapter, we will be studying Systems of équations of degree one. We shall 
let the number of équations and number of unknowns be arbitrary. We begin by 
choosing suitable notation. Since the number of unknowns can be arbitrarily large, 
it will not suffice to use the twenty-six letters of the alphabet: x, y, . . . , z, and so on. 
Therefore, we shall use a single letter to designate ail the unknowns and distinguish 
among them with an index, or subscript: x\,X2, . . . , x n , where n is the number of un- 
knowns. The coefficients of our équations will be notated using the same principle, 
and a single équation of the first degree will be written thus: 

a\x\ + 02*2 + • • • + a n x n = b. ( 1 . 1 ) 

A first-degree équation is also called a linear équation. 

We shall use the same principle to distinguish among the various équations. But 
since we hâve already used one index for designating the coefficients of the un- 
knowns, we introduce a second index. We shall dénoté the coefficient of Xk in the 
/ th équation by 0^. To the right side of the ith équation we attach the Symbol /?;. 
Therefore, the i th équation is written 


anx\ + 0/2*2 H b a in x } 7 = Z?/, ( 1 . 2 ) 

and a System of m équations in n unknowns will look like this: 

011*1 H - 012*2 + • • • + d\ n x n — b \ , 

021*1 + 022*2 H b 02^*77 = Z?2, 


0/7? 1*1 + 0/772*2 + b 0/7777 


X 


77 — 



The numbers b i , . . . , b m are called the constant terms or just constants of the System 
( 1 . 3 ). It will sometimes be convenient to focus our attention on the coefficients of 
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the unknowns in System (1.3), and then we shall use the following tableau: 


/ an 

«12 

« 1/7 ^ 

«21 

«22 

• • • « 2/7 

• * 

\fm 1 

«777 2 

• • 

• 

« m /7 / 


(1.4) 


with m rows and n columns. Such a rectangular array of numbers is called an m x n 
matrix or a matrix of type ( m,n ), and the numbers aij are called the éléments of 
the matrix. If m — n, then the matrix is an n x n square matrix. In this case, the 
éléments a \\ , « 22 , • . . , a, m , each located in a row and column with the same index, 
form the matrix’s main diagonal. 

The matrix (1.4), whose éléments are the coefficients of the unknowns of System 
(1.3), is called the matrix associated with the System. Along with the matrix (1.4), it 
is frequently necessary to consider the matrix that includes the constant terms: 


/ «11 

«12 

« 1/7 

b\\ 

«21 

«22 

‘ ‘ * «277 

• * 

b 2 

\«777 1 

«7772 

• • 

# 

«77777 

b m ) 


(1.5) 


This matrix has one column more than matrix (1.4), and thus it is an m x {n + 1) 
matrix. Matrix (1.5) is called the augmented matrix ofthe System (1.3). 

Let us consider in greater detail the left-hand side of équation (1.1). Here we 
are usually talking about trying to find spécifie values of the unknowns x\, ... ,x n 
that satisfy the relationship (1.1). But it is also possible to consider the expression 
a\x\ + « 2*2 + • • • + a n x n from another point of view. We can substitute arbitrary 
numbers 


X\ — Cl, X'2 — C 2 , ••• ? %n — «77 ■> (1-6) 

for the unknowns x \ , x 2 , . . . , x n in the expression, each time obtaining as a resuit a 
certain number 


a\C\ -|- a 2 C 2 + • • * + ClnCfi • (1.7) 

From this point of view, we are dealing with a certain type of function. In the given 
situation, the initial element to which we are associating something is the set of 
values (1.6), which is determined simply by the set of numbers (ci, C 2 , . . . , c n ). We 
shall call such a set of numbers a row of length n. It is the same as a 1 x n matrix. 
We associate the expression (1.7), which is a number, with the row (ci , C 2 , . . . , c n ). 
Then employing the notation of page xiii, we obtain a function on the set M with 
values in N , where M is the set of ail rows of length n, and N is the set of ail 
numbers. 

Définition 1.1 A function F on the set of ail rows of length n with values in the set 
of ail numbers is said to be linear if there exist numbers a \ , « 2 , . . . , a n such that F 
associâtes to each row (ci, C 2 , . . . , c n ) the number (1.7). 
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We shall proceed to dénoté a row by a single boldface italic letter, such as c, 
and shall associate with it a number, F (c), via the linear function F. Thus if c — 

(c\, c 2 , . . • , c n ), then F(c) = a\c\ + a 2 c 2 H h a n c n . 

In the case n — 1, a linear function coincides with the well-known concept of 
direct proportionality, which will be familial' to the reader from secondary-school 
mathematics. Thus the notion of linear function is a natural generalization of direct 
proportionality. To emphasize this analogy, we shall define some operations on rows 
of length n in analogy to arithmetic operations on numbers. 

Définition 1.2 Let c and d be rows of a fixed length n, that is, 

c = (ci, C 2 , • • • , c n ), d = (d \ , d 2 , . . . , d n ). 

Their sum is the row (ci + d \ , c 2 + d 2 , . . . , c n + d n ), denoted by c + d. The product 
of row c and the number p is the row ( pc \ , pc 2 , . . . , pc n ), denoted by pc. 

Theorem 1.3 A function F on the set of rows of length n is linear if and only if it 
possesses the following properties : 

F(c + rf) = F(c) + F(rf), (1.8) 

F{pc) = pF(c ), (1.9) 

for ail rows c, d and ail numbers p. 

P roof Properties (1.8) and (1.9) are the direct analogue of the well-known condi- 
tions for direct proportionality. 

The proof of properties (1.8) and (1.9) is completely obvious. Let the linear 
function F associate to each row c — (ci, c 2 , . . . , c n ) the number (1.7). By the 
above définition, the sum of rows c = (ci , . . . , c n ) and d = (d \ , . . . , d n ) is the row 
c + d = (ci + d \ , . . . , c n + d n ), and it follows that 

F (c + d) — a\{c\ + d\) + • • • + ci n {c n + d n ) 

— (a\c\ + a i d\ ) + • • • + ( a n c n + a n d n ) 

= (d\C\ + • • • + a n c n ) + (a\d\ + • • • + a n d n ) 

= F(c) + F(d ), 

which is équation (1.8). In exactly the same way, we obtain 

F(pc)=a\(pci) H b a n (pc n ) = p(a\C\ H b a n c n ) = pF(c). 

Let us now prove the reverse assertion: any function F on the set of rows of length 
n with numerical values satisfying properties (1.8) and (1.9) is linear. To show this, 
let us consider the row c/ in which every entry except the i th is equal to zéro, while 
the i th is equal to 1, that is, c; = (0, . . . , 1, . . . , 0), where the 1 is in the i th place. 
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Let us set F{ei) = a / and let us prove that for an arbitrary row c = (c \ , . . . , c n ), the 

following equality is satisfied: F(c) = a\c\ H h a n c n . From that we will be able 

to conclude that the function F is linear. 

For this, let us convince ourselves that c = c\e\ + • • • + c n e n . This is almost 
obvious: let us consider what number is located at the / th place in the row c\e\ + 
• • • + c n e n . In any row ek with k / /, there is a 0 in the /th place, and therefore, the 
same is true for Ckek, which means that in the row c/e z , the element c z is located at 
the /th place. As a resuit, in the complété sum c\e\ + • • • + c n e n , there is c/ at the 
/th place. This is true for arbitrary /, which implies that the sum under considération 
coincides with the row c. 

Now let us consider F(c). Using properties (1.8) and (1.9) n times, we obtain 

F(c) = F(ciei) + F(c 2 e 2 H h c n e n ) = c\F(e\) + F(c 2 e 2 H h c n e n ) 

— a\c\ + F(c 2 e 2 H h c n e n ) — a\c\ + a 2 c 2 + F(c 2 e 2 H h c n e n ) 

— • • • — a\C\ + «2 c 2 H" * * ' H" a n c n » 


as asserted. 



We shall soon convince ourselves of the usefulness of these properties of a linear 
function. Let us define the operations on linear functions that we shall be meeting 
in the sequel. 

Définition 1.4 Let F and G be two linear functions on the set of rows of length N. 
Their sum is the function F + G, on the same set, defined by the equality (F + 
G)(c) = F(c) + G(c) for every row c. The product of the linear function F and the 
number p is the function pF, defined by the relation ( pF)(c ) = p ■ F(c). 


Using Theorem 1.3, we obtain that both F + G and pF are linear functions. 

We return now to the System of linear équations (1.3). Clearly, it can be written 
in the form 


F\{x) = b\, 


F m (x) = b 

. 


m ? 


( 1 . 10 ) 


where F\ (je), . . . , F m (x) are linear functions defined by the relationships 


Fi(x) = aux i + ai 2 x 2 H h ai n x n - 

A row c is called a solution of the System (1.10) if on substituting x by c, ail the 
équations are transformed into identifies, that is, F\ (c) = b\, . . . , F m (c) = b m . 

Pay attention to the word “if”! Not every System of équations has a solution. For 
example, the System 


x\ + x 2 + • • • + ^îoo — 0, 
x\ + x 2 -\ h -^100 = 1, 
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Fig. 1.1 The intersection of 
two Unes 



of two équations in one hundred unknowns clearly cannot hâve any solution. 

Définition 1.5 A System possessing at least one solution is said to be consistent , 
while a System with no solutions is called inconsistent. If a System is consistent 
and has only one solution, then it is said to be definite, and if it has more than one 
solution, it is indefinite. 

A definite System is also called uniquely determined, since it has precisely one 
solution. 

Definite Systems of équations are encountered frequently, for instance when from 
external considérations it is clear that there is only one solution. For example, sup- 
pose we wish to find the unique point lying on the lines defined by the équations 
x — y and x + y — 1; see Fig. 1.1. It is clear that these lines are not parallel and 
therefore hâve exactly one point of intersection. This means that the System consist- 
ing of the équations of these two lines is definite. It is easy to find its unique solution 
by a simple calculation. To do so, one may substitute the condition y — x into the 
second équation. This yields 2x = 1, that is, x = 1 /2, and since y = x, we hâve also 
y = 1/2. 

The reader has almost certainly encountered indefinite Systems in secondary 
school, for example, the System 


[x — 2y = 1, 
y3x — 6 y = 3. 


(l.H) 


It is obvious that the second équation is obtained by multiplying the first équation 
by 3. Therefore, the System is satisfied by ail x and y that satisfy the first équation. 
From the first équation, we obtain 2y — x — 1, or equivalently, y — {x — l)/2. We 
can now choose an arbitrary value for x and obtain the corresponding value y = 
(x — l)/2. Our System thus has infinitely many solutions and is therefore indefinite. 
We hâve now seen examples of the following types of Systems of équations: 

(a) having no solutions (inconsistent), 

(b) having a unique solution (consistent and definite), 

(c) having infinitely many solutions (for example, System (1.11)). 

Let us show that these three cases are the only possibilities. 
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Theorem 1.6 If a System of linear équations is consistent and indefinite , then it has 
infinitely many solutions. 

P roof B y the hypothesis of the theorem, we hâve a System of linear équations that 
is consistent and that contains more than one solution. This means that it has at 
least two distinct solutions: c and d. We shall now construct an infinité number of 
solutions. 

To do so, we consider, for an arbitrary number p, the row r — pc + (1 — p)d. We 
shall show first of ail that the row r is also a solution. We suppose our System to be 
written in the form (1.10). Then we must show that F/(r) = bi for ail i — 1, . . . , m. 
Using properties (1.8) and (1.9), we obtain 


Fj(r) = Fj(pc + (1 - p)d ) = p Fi (c) + (1 - p)F,(d) = pbi + (1 - p)b t = b /, 

since c and d are solutions of the System of équations (1.10), that is, F/(c) = 
Fi (d) = bi for ail i = 1 , . . . , m. 

It remains to verify that for different numbers p we obtain different solutions. 
Then we will hâve shown that we hâve infinitely many of them. Let us suppose that 
two different numbers p and p' yield the same solution pc + (1 — p)d — p'c + (1 — 
p')d. We observe that we can operate on rows just as on numbers in that we can 
move terms from one side of the équation to the other and remove a common factor 
from the terms inside parenthèses. This is justified because we defined operations 
on rows in terms of operations on the numbers that constitute them. As a resuit, we 
obtain the relation ( p — p')c = (p — p')d. Since by assumption, p p ', we can 
cancel the factor p — p' . On doing so, we obtain c — d, but by hypothesis, c and d 
were distinct solutions. From this contradiction, we conclude that every choice of p 
yields a distinct solution. □ 


1.2 Gaussian Elimination 

Our goal now is to demonstrate a method of determining to which of the three types 
mentioned in the previous section a given System of linear équations belongs, that is, 
whether it is consistent, and if so, whether it is definite. If it is consistent and definite, 
then we would like to find its unique solution, and if it is consistent and indefinite, 
then we want to write down its solutions in some useful form. There exists a simple 
method that is effective in each concrète situation. It is called Gaussian élimination , 
or Gauss’s method, and we now présent it. We are going to be dealing here with 
proof by induction. That is, beginning with the simplest case, with m — 1 équations, 
we then move on to the case m — 2, and so on, so that in considering the general 
case of a System of m linear équations, we shall assume that we hâve proved the 
resuit for Systems with fewer than m équations. 

The method of Gaussian élimination is based on the idea of replacing the given 
System of linear équations with another System having the same solutions. Let us 


1 .2 Gaussian Elimination 


7 


consider along with System (1.10) another System of linear équations in the same 
number of unknowns: 


CnC c) = /i, 

• • • 

Gi(x) = f l , 


0 - 12 ) 


where G [ (x), . . . , G/ (x) are some other linear functions in n unknowns. The System 
(1.12) is said to be équivalent to System (1.10) if both Systems hâve exactly the same 
solutions, that is, any solution of System (1.10) is also a solution of System (1.12), 
and vice versa. 

The idea behind Gaussian élimination is to use certain elementary roxv operations 
on the System that replace a System with an équivalent but simpler System for which 
the answers to the questions about solutions posed above are obvious. 


Définition 1.7 An elementary row operation of type I on System (1.3) or (1.10) 
consists in the transposition of two rows. So that there will be no uncertainty about 
what we mean, let us be précisé: under this row operation, ail the équations of the 
System other then the i th and the kth are left unchanged, while the i th and k th 
exchange places. 


Thus the number of elementary row operations of type I is equal to the number 
of pairs /, k, i k, that is, the number of combinations of m things taken 2 at a time. 

Définition 1.8 An elementary row operation of type II consists in the replacement 
of the given System by another in which ail équations except the i th remain as be- 
fore, and to the i th équation is added c times the kth équation. As a resuit, the i th 
équation in System (1.3) takes the form 


(an + cajn)x\ + (an + caki)x 2 H h (a-m + cak n )x n — bi + cbk. (1.13) 

An elementary row operation of type II dépends on the choice of the indices i 
and k and the number c, and so there are infinitely many row operations of this type. 

Theorem 1.9 Application of an elementary row operation of type I or II results in 
a System that is équivalent to the original one. 

Proof The assertion is completely obvious in the case of an elementary row oper- 
ation of type I: whatever solutions a System may hâve cannot dépend on the nu- 
mération of its équations (that is, on the ordering of the System (1.3) or (1.10)). We 
could even not number the équations at ail, but write each of them, for example, on 
a separate pièce of paper. 

In the case of an elementary row operation of type II, the assertion is also fairly 
obvious. Any solution c = (c \ , . . . , c n ) of the first System after the substitution satis- 
fies ail the équations obtained under this elementary row operation except possibly 
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the i th, simply because they are identical to the équations of the original System. 
It remains to settle the question for the i th équation. Since c was a solution of the 
original System, we hâve the following equalities: 


I Cl +fl| 2 C 2 -f Va-mCn =bi, 

a kl c 1 + a k2 c 2 H 1" a kn c n — ^k- 


After adding c times the second of these équations to the first, we obtain equality 
(1.13) for = ci, . . . , x n = c n . This means that c satisfies the i th équation of the 
new System; that is, c is a solution. 

It remains to prove the reverse assertion, that any solution of the System obtained 
by a row operation of type II is a solution of the original System. To this end, we 
observe that adding —c times the kth équation to équation (1.13) yields the i th 
équation of the original System. That is, the original System is obtained from the 
new System by an elementary row operation of type II using the factor — c. Thus, 
the previous line of argument shows that any solution of the new System obtained by 
an elementary row operation of type II is also a solution of the original System. □ 

Let us now consider Gauss’s élimination method. As our first operation, let us 
perform on System (1.3) an elementary row operation of type I by transposing the 
first équation and any other in which x\ appears with a coefficient different from 0 . 
If the first équation possesses this property, then no such transposition is necessary. 
Now, it can happen that x\ appears in ail the équations with coefficient 0 (that is, x\ 
does not appear at ail in the équations). In that case, we can change the numbering 
of the unknowns and designate by x\ some unknown that appears in some équation 
with nonzero coefficient. After this completely elementary transformation, we will 
hâve obtained that ci\\ ^ 0. For completeness, we should examine the extreme case 
in which ail unknowns appear in ail équations with zéro coefficients. But in that 
case, the situation is trivial: ail the équations take the form 0 — b[. If ail the b[ are 0, 
then we hâve the identities 0 = 0 , which are satisfied for ail values assigned to xi , 
that is, the System is consistent and indeterminate. But if a single Z?/ is not equal to 
zéro, then that i th équation is not satisfied for any values of the unknowns, and the 
System is inconsistent. 

Now let us perform a sequence of elementary row operations of type II, adding 
to the second, third, and so on up to the mth équation the first équation multiplied 
respectively by some numbers C2, C3, . . . , c m in order to make the coefficient of x\ 
in each of these équations equal to zéro. It is clear that to do this, we must set 
C2 = —a2\a^, C3 = • • • , c rn — —a m \a^ , which is possible because we 

hâve ensured by hypothesis that a\\ 7 ^ 0. As a resuit, the unknown x\ appears in 
none of the équations except the first. We hâve thereby obtained a System that can 
be written in the following form: 
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a\\X\ + n^n — ^ 1 » 

a 22 x 2 4 “ * ’ * 4 " a 2n Xn ~ ^ 2 ’ 

<2*2 + ‘ • ‘ + a 'mn x n = b' m 


(1.14) 


Since System (1.14) was obtained from the original System (1.3) by elementary row 
operations, it follows from Theorem 1.3 that the two Systems are équivalent, that 
is, the solution of an arbitrary System (1.3) has been reduced to the solution of the 
simpler System (1.14). That is precisely the idea behind the method of Gaussian 
élimination. It in fact reduces the problem to the solution of a System of m — 1 
équations: 


a 22 X l 4 •" a 2n Xn = ^2 


a m2 X 2 + ■ ■ ■ + a mn x n = b 


! 

m • 


(1.15) 


Now if System (1.15) is inconsistent, then clearly, the larger System (1.14) is also 
inconsistent. If System (1.15) is consistent and we know the solution, then we can 
obtain ail solutions of System (1.14). Namely, if X 2 — C 2 , . . . , x n — c n is any solution 
of System (1.15), then we hâve only to substitute these values into the first équation 
of the System (1.14). As a resuit, the first équation of System (1.14) takes the form 


a\\x\ + a\ 2 c 2 + • • • + a\ n c n — b \ , 


(1.16) 


and we hâve one linear équation for the remaining unknown x\ , which can be solved 
by the well-known formula 

X\ — (^1 ^ 12^2 ' ’ ’ tt\ n Cn)-> 

which can be accomplished because a\\ ^0. This reasoning is applicable in partic- 
ular to the case m = 1 (if we compare Gauss’ s method with the method of proof by 
induction, then this gives us the base case of the induction). 

Thus the method of Gaussian élimination reduces the study of an arbitrary System 
of m équations in n unknowns to that of a System of m — 1 équations in n — 1 
unknowns. We shall illustrate this after proving several general theorems about such 
Systems. 


Theorem 1.10 If the number of unknowns in a System of équations is greater than 
the number of équations, then the System is either inconsistent or indefinite. 


In other words, by Theorem 1.6, we know that the number of solutions of an 
arbitrary System of linear équations is 0, 1 , or infinity. If the number of unknowns 
in a System is greater than the number of équations, then Theorem 1.8 asserts that 
the only possible number of solutions is 0 or infinity. 
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Proof of Theorem 1.10 We shall prove the theorem by induction on the number m 
of équations in the System. Let us begin by considering the case m — 1, in which 
case we hâve a single équation: 

a\x\ + aiX 2 H Y a n x n = b\. (1.17) 

We hâve n > 1 by hypothesis, and if even one a / is nonzero, then we can number 
the unknowns in such a way that a\ 0. We then hâve the case of équation (1.16). 
We saw that in this case, the System was consistent and indefinite. 

But there remains one case to consider, that in which a,- = 0 for ail i = 1, . . . , n. 
If in this case b\ ^ 0, then clearly we hâve an inconsistent “system” (consisting of 
a single inconsistent équation). If, however, by — 0, then a solution consists of an 
arbitrary sequence of numbers x\ = ci, *2 = C 2 , . . . , x n — c n , that is, the “system” 
(consisting of the équation 0 = 0) is indefinite. 

Now let us consider the case of m > 1 équations. We employ the method of 
Gaussian élimination. That is, after writing down our System in the form (1.3), we 
transform it into the équivalent System (1.14). The number of unknowns in the Sys- 
tem (1.15) is /i — 1, and therefore larger than the number of équations m — 1, since 
by the hypothesis of the theorem, n > m. This means that the hypothesis of the 
theorem is satisfied for System (1.15), and by induction, we may conclude that the 
theorem is valid for this System. If System (1.15) is inconsistent, then ail the more 
so is the larger System (1.14). If it is indefinite, that is, has more than one solution, 
then in the initial System there will be more than one solution; that is, System (1.3) 
will be indefinite. □ 

Let us now focus attention on an important spécial case of Theorem 1.10. A Sys- 
tem of linear équations is said to be homogeneous if ail the constant terms are equal 
to zéro, that is, in (1.3), we hâve b\ = • • • = b m — 0. A homogeneous System is al- 
ways consistent: it has the obvious solution x\ = •••=*„ =0. Such a solution is 
called a null solution. We obtain the following corollary to Theorem 1.10. 

Corollary 1.11 If in a homogeneous system , the number of unknowns is greater 
than the number of équations, then the system has a solution that is different f rom 
the null solution. 

If we dénoté (as we hâve been doing) the number of unknowns by n and the 
number of équations by m , then we hâve considered the case n > m. Theorem 1.10 
asserts that for n > m, a system of linear équations cannot hâve a unique solution. 
Now we shall move on to consider the case n = m. We hâve the following rather 
surprising resuit. 

Theorem 1.12 If in a system of linear équations , the number of unknowns is equal 
to the number of équations, then the property of hav in g a unique solution dépends 
only on the values ofthe coefficients and not on the values ofthe constant terms. 

Proof The resuit is easily obtained by Gaussian élimination. Let the system be writ- 
ten in the form (1.3), with n — m. Let us deal separately with the case that ail the co- 
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efficients are zéro (in ail équations), in which case the System cannot be uniquely 
determined regardless of the constants bi . Indeed, if even a single bj is not equal to 
zéro, then the zth équation gives an inconsistent équation; and if ail the bi are zéro, 
then every choice of values for the je* gives a solution. That is, the System is indefi- 
nite. 

Let us prove Theorem 1. 12 by induction on the number of équations ( m — n ). We 
hâve already considered the case in which ail the coefficients aa are equal to zéro. 
We may therefore assume that among the coefficients auç, some are nonzero and 
the System can be written in the équivalent form (1.14). But the solutions to (1.14) 
are completely determined by System (1.15). In System (1.15), again the number of 
équations is equal to the number of unknowns (both equal to m — 1). Therefore, 
reasoning by induction, we may assume that the theorem has been proved for this 
System. However, we hâve seen that consistency or definiteness of System (1.14) 
was the same as that for System (1.15). In conclusion, it remains to observe that the 
coefficients a' jk of System (1.15) are obtained from the coefficients of System (1.3) 
by the formulas 

/ «21 / «31 / «ml 

«2 k — «2& «1 ki «3£ — «3 k «1 k-> •••> a mk — a mk «1 k- 

«11 «11 «11 

Thus the question of a unique solution is determined by the coefficients of the orig- 
inal System (1.3). □ 

Theorem 1.12 can be reformulated as follows: if the number of équations is equal 
to the number of unknowns and the System has a unique solution for certain values 
of the constant terms b[ , then it has a unique solution for ail possible values of the 
constant terms. In particular, as a choice of these “certain” values we may take ail 
the constants to be zéro. Then we obtain a System with the same coefficients for the 
unknowns as in System (1.3), but now the System is homogeneous. Such a System is 
called the homogeneous System associated with System (1.3). We see, then, that if 
the number of équations is equal to the number of unknowns, then the System has 
a unique solution if and only if its associated System has a unique solution. Since 
a homogeneous System always has the null solution, its having a unique solution is 
équivalent to the absence of nonnull solutions, and we obtain the following resuit. 

Corollary 1.13 If in a System oflinear équations , the number of équations is equal 
to the number of unknowns, then it has a unique solution if and only if its associated 
homogeneous System has no solutions other than the null solution. 

This resuit is unexpected, since from the absence of a solution different from the 
null solution, it dérivés the existence and uniqueness of the solution to a different 
System (with different constant terms). In functional analysis, this resuit is called 
the Fredholm alternative . 1 


1 More precisely, the Fredholm alternative comprises several assertions, one of which is analogous 
to the one established above. 
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In order to focus on the theory behind the Gaussian method, we emphasized its 
“inductive” character: it reduces the study of a System of linear équations to an 
analogous System, but with fewer équations and unknowns. It is understood that in 
concrète examples, we must repeat the process, using this latter System and contin- 
uing until the process stops (that is, until it can no longer be applied). Now let us 
make clear for ourselves the form that the resulting System will take. 

When we transform System (1.3) into the équivalent System (1.14), it can happen 
that not ail the unknowns * 2 , . . . , x n enter into the corresponding System (1.15), that 
is, some of the unknowns may hâve zéro coefficients in ail the équations. Moreover, 
it was not easy to surmise this from the original System (1.3). Let us dénoté by k 
the first index of the unknown that appears with coefficients different from zéro in at 
least one équation of System (1 . 15). It is clear that k > 1 . We can now apply the same 
operations to this System. As a resuit, we obtain the following équivalent System: 


011*1 + ci\ n Xyi — b \ , 

a' 2k Xk + + a' 2n x n = b' 2 , 

a $ [Xi + + a 2n x n = b 2 , 


a ml X ‘ + + a mn x n=K- 


Here we hâve already chosen / > k such that in the System obtained by removing 
the first two équations, the unknown x\ appears with a coefficient different from 
zéro in at least one équation. In this case we will hâve a\ \ 7 ^ 0, a' 2k 7 ^ 0, a'^ 7 ^ 0, and 
l > k > 1 . 

We shall repeat this process as long as possible. When shall we be forced to stop? 
We stop after having applied the elementary operations up to the point (let us say 
the rth équation in which x s is the first unknown with nonzero coefficient) at which 
we hâve reduced to zéro ail the coefficients of ail subséquent unknowns in ail the 
remaining équations, that is, from the (s + l)st to the nth. The System then has the 
following form: 


« 11*1 + +a\ n x n = b 1 , 

02 k%k H - T 02/7*72 — ^2 1 

03/*/ H - H - 03/2*77 — b^, , 


(1.18) 

0/'5*.ç H - H - 0 r/ 2 */2 = b r-> 

0 = /?/-+!, 


0 = b m . 


Here 1 < k < l < ■ • • < s . 
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It can happen that r = m, and therefore, there will be no équations of the form 
0 = bj in System (1.18). But if r < m, then it can happen that b r +\ = 0, . . . , b m = 0, 
and it can finally be the case that one of the numbers b r +\ , . . . , b m is different from 
zéro. 

Définition 1.14 System (1.18) is said to be in (row) échelon form. The same termi- 
nology is applied to the matrix of such a System. 

Theorem 1.15 Every System oflinear équations is équivalent to a System in échelon 
form (1.18). 

P roof Since we transformed the initial System into the form (1.18) using a sequence 
of elementary row operations, it follows from Theorem 1.9 that System (1.18) is 
équivalent to the initial System. □ 

Since any System of the form (1.3) is équivalent to System (1.18) in échelon 
form, questions about consistency and definiteness of Systems can be answered by 
studying Systems in échelon form. 

Let us begin with the question of consistency. It is clear that if System (1.18) 
contains équations 0 = bk with bk 7 ^ 0, then such a System is inconsistent, since the 
equality 0 — bk cannot be satisfied by any values of the unknowns. Let us show that 
if there are no such équations in System (1.18), then the System is consistent. Thus 
we now assume that in System (1.18), the last m — r équations hâve been converted 
into the identities 0 = 0 . 

Let us call the unknowns x\ ,Xk,xi, . . . , x s that begin the first, second, third, . . . , 
rth équations of System (1.18) principal , and the rest of the unknowns (if there are 
any) we shall call free. Since every équation in System (1.3) begins with its own 
principal unknown, the number of principal unknowns is equal to r. We recall that 
we hâve assumed b r +\ = • • • = b m — 0 . 

Let us assign arbitrary values to the free unknowns and substitute them in the 
équations of System (1.18). Since the rth équation contains only one principal un- 
known x s , and that with the coefficient â rs , which is different from zéro, we obtain 
for Xy one équation in one unknown, which has a unique solution. Substituting this 
solution for Xy into the équation above it, we obtain for that équation’ s principal 
unknown again one équation in one unknown, which also has a unique solution. 
Continuing in this way, moving from bottom to top in System (1.18), we see that the 
values of the principal unknowns are determined uniquely for an arbitrary assign- 
ment of the free unknowns. We hâve thus proved the following theorem. 

Theorem 1.16 For a System oflinear équations to be consistent , it is necessary and 
suffcient , after it has been brought into échelon form, that there be no équations of 
the form 0 = bk with bk 7 ^ 0. If this condition is satisfied , then it is possible to assign 
arbitrary values to the free unknowns , while the values of the principal unknowns — 
for each given set of values for the free unknowns — are determined uniquely from 
the System. 
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Let us now explain when a System will be definite on the assumption that the 
condition of consistency that we hâve been investigating is satisfied. This question 
is easily answered on the basis of Theorem 1.16. Indeed, if there are free unknowns 
in System (1.18), then the System is certainly not definite, since we may give an arbi- 
trary assignment to each of the free unknowns, and by Theorem 1.16, the assignment 
of principal unknowns is then determined by the System. On the other hand, if there 
are no free unknowns, then ail the unknowns are principal. By Theorem 1.16, they 
are uniquely determined by the System, which means that the System is definite. 
Consequently, a necessary and sufficient condition for definiteness is that there be 
no free unknowns in System (1.18). This, in turn, is équivalent to ail unknowns in the 
System being principal. But that, clearly, is équivalent to the equality r = n, since r 
is the number of principal unknowns and n is the total number of unknowns. Thus 
we hâve proved the following assertion. 

Theorem 1.17 For a consistent System (1.3) to be definite , it is necessary and suffi- 
cient that for System (1.18), after it has been brought into échelon form, we hâve the 
equality r — n. 

Remark 1.18 Any System of n équations in n unknowns (that is, with m — n) 
brought into échelon form can be written in the form 


011*1 ~\~ a 12*2 + +a\ n x n = b i, 

022*2 + + 02 «*« — ^ 2 > 

(1.19) 


0/7/Z*/Z — O n 

(however, not every System of the form (1.19) is in échelon form, since some of the 
an can be zéro). Indeed, the form (1.19) indicates that in the System, the kth équation 
does not dépend on the unknowns for i < k, and this condition is automatically 
satisfied for a System in échelon form. 

A System in the form (1.19) is said to be in upper triangular form. The same 
terminology is applied to the matrix of System (1.19). 

From this observation, we can State Theorem 1.15 in a different form for the 
case m — n. The condition r — n means that ail the unknowns x\, X 2 , are 

principal, and that means that in System (1.19), the coefficients satisfy â\\ 0, . . . , 

a nn 0. This proves the following corollary. 

Corollary 1.19 System (1.3) in the case m — n is consistent and determinate if and 
only if after being brought into échelon form, we obtain the upper triangular System 
(1.19) with coefficients â\\ 0, 022 7 ^ 0, ... , a nn 0. 

We see that this condition is independent of the constant terms, and we thereby 
obtain another proof of Theorem 1.12 (though it is based on the same idea of the 
method of Gaussian élimination). 
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Fig. 1.2 Graph of a 
polynomial passing through a 
given set of points 



1.3 Examples* 

We shall now give some examples of applications of the Gaussian method and with 
its aid obtain some new results for the investigation of concrète problems. 


Example 1.20 The expression 


f — üq T ü\x T ü2X a H + a n x n , 


where the ai are certain numbers, is called a polynomial in the unknown x. If 
a n ^ 0, then the number n is called the degree of the polynomial /. If we re- 
place the unknown x by some numerical value x — c, we obtain the number 

ao + a\c + aie 2 H + a n c n , which is called the value of the polynomial at x = c\ 

it is denoted by /(c). 

The following type of problem is frequently encountered: We are given two col- 
lections of numbers c\ , . . . , c r and k\ , . . . , k r such that c\ , . . . , c r are distinct. Is it 
possible to find a polynomial / such that f(a) = ki for i = 1, . . . , r? The pro- 
cess of constructing such a polynomial is called interpolation. This type of problem 
is encountered when values of a certain variable are measured experimentally (for 
example, température) at different moments of time c\, ... ,c r . If such an interpo- 
lation is possible, then the polynomial thus obtained provides a single formula for 
température that coincides with the experimentally measured values. 

We can provide a more graphie depiction of the problem of interpolation by 
stating that we are seeking a polynomial f(x) of degree n such that the graph of 
the function y = f(x) passes through the given points (c/, kj) in the Cartesian plane 
for i = 1, . . . , r (see Fig. 1.2). 

Let us write down the conditions of the problem explicitly: 


a o + a\c\ -|- • • • + a n c ” — k \ , 

a 0 + ü\C2 + * ' * + ün c 2 — ^2, 

ao T a\c r T • • • T a n c n r — k r . 


( 1 . 20 ) 


For the desired polynomial f we obtain relationship (1.20), which is a System of lin- 
ear équations. The numbers ao , . . . ,a n are the unknowns. The number of unknowns 
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is n + 1 (the numération begins here not with the usual a\, but with üq). The num- 
bers 1 and c\ are the coefficients of the unknowns, and k \, . . . , k r are the constant 
terms. 

If r — n + 1, then we are in the situation of Theorem 1.12 and its corollary. 
Therefore, for r — n + 1, the interpolation problem has a solution, and a unique one, 
if and only if the associated System (1.20) has only the null solution. This associated 
System can be written in the form 


7(ci) = o, 
/ (c 2 ) = 0 , 


( 1 . 21 ) 


f (c r ) = 0. 


A number c for which /(c) = 0 is called a root of the polynomial /. A simple 
theorem of algebra (a corollary of what is known as Bézout’s theorem) States that 
a polynomial cannot hâve more distinct roots than its degree (except in the case 
that ail the a / are equal to zéro, in which case the degree is undefined). This means 
(if the numbers c, are distinct, which is a natural assumption) that for r — n + 1, 
équations (1 .21) can be satisfied only if ail the ai are zéro. We obtain that under these 
conditions, System (1.20) (that is, the interpolation problem) has a solution, and the 
solution is unique. We note that it is not particularly difficult to obtain an explicit 
formula for the coefficients of the polynomial /. This will be done in Sects. 2.4 
and 2.5. 


The following example is somewhat more difficult. 

Example 1.21 Many questions in physics (such as the distribution of heat in a solid 
body if a known température is maintained on its surface, or the distribution of elec- 
tric charge on a body if a known charge distribution is maintained on its surface, and 
so on) lead to a single differential équation, called the Laplace équation. It is a partial 
differential équation, which we do not need to describe here. It suffices to mention 
one conséquence, called the mean value property , according to which the value of 
the unknown quantity (satisfying the Laplace équation) is equal at every point to 
the arithmetic mean of its values at “nearby” points. We need not make précisé here 
just what we mean by “nearby points” (suffice it to say that there are infinitely many 
of them, and this property is defined in terms of the intégral). We will, however, 
présent a method for an approximate solution of the Laplace équation. Solely for 
the purpose of simplifying the présentation, we shall consider the two-dimensional 
case instead of the three-dimensional situation described above. That is, instead of 
a three-dimensional body and its surface, we shall examine a two-dimensional fig- 
ure and its boundary; see Fig. 1.3(a). To construct an approximate solution in the 
plane, we form a lattice of identical small squares (the smaller the squares, the bet- 
ter the approximation), and the contour of the figure will be replaced by the closest 
approximation to it consisting of sides of the small squares; see Fig. 1.3(b). 
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Fig. 1.3 Constructing an 
approximate solution to the 
Laplace équation 


(a) 


(b) 




Fig. 1.4 The “nearby 
vertices” to a are the points 
b, c, cl, e 
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We examine the values of the unknown quantity (température, charge, etc.) only 
at the vertices of the small squares. Now the concept of “nearby points” acquires 
an unambiguous meaning: each vertex of a square of the lattice has exactly four 
nearby points, namely the “nearby” vertices. For example, in Fig. 1.4, the point a 
has nearby vertices b, c, d, e. 

We consider as given some quantities x a for ail the vertices a of the squares inter- 
secting the boundary (the thick straight lines in Fig. 1.3(b)), and we seek such values 
for the vertices of the squares located inside this contour. Now an approximate ana- 
logue of the mean value property for the point a of Fig. 1 .4 is the relationship 



Xb + x c + Xd + 

4 


( 1 . 22 ) 


There are thus as many unknowns as there are vertices inside the contour, and to 
each such vertex there corresponds an équation of type (1.22). This means that we 
hâve a System of linear équations in which the number of équations is equal to the 
number of unknowns. If one of the vertices b, c,d,e is located on the contour, then 
the corresponding quantity, one of Xb, x c , Xd, x e , must be assigned, and équation 
(1.22) in this case is inhomogeneous. An assertion from the theory of linear équa- 
tions that we shall prove is that regardless of how we assign values on the boundary 
of the figure, the associated System of linear équations always has a unique solution. 

We clearly find ourselves in the situation of Corollary 1.13, and so it suffices to 
verify that the homogeneous System associated with ours has only the null solution. 
The associated homogeneous System corresponds to the case in which ail the values 
on the boundary of the figure are equal to zéro. Let us suppose that it has a solution 
x\, ...,xn (where N is the number of équations) that is not the null solution. If 
among the numbers xi there is at least one that is positive, then let us dénoté by x a 
the largest such number. Then équation (1.22) (in which any of Xb,x c , Xd, x e will 
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Fig. 1.5 Simple contour for 
an approximate solution of 
the Laplace équation 
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Fig. 1.6 Electrical network £ 



equal zéro if the associated point b, c,d,e lies on the contour) can be satisfied only 
if xi = x c — x c i — x e — x a , since the arithmetic mean does not exceed the maximum 
of the numbers. 

We can reason analogously for the point /?, and we find that the value of each 
nearby point is equal to x a . By continuing to move to the right, we shall eventually 
reach a point p on the contour, for which we obtain x p — x a > 0. But that contradicts 
the assumption that the value of x p for the point p on the contour is equal to zéro. 
For example, for the simple contour of Fig. 1.5, we obtain the equalities xj ? — x a , 
x c = Xb — x a , xj = x a , x e = x a , x p = x a , the last of which is impossible, since 
x a > 0, x p — 0. If ail the numbers X[ in our solution are nonpositive but not ail 
equal to zéro, then we can repeat the above argument with x a taken as the smallest 
of them (the largest of the numbers in absolute value). 

The above arguments can be applied to proving the existence of a solution to the 
Laplace équation (by passage to the limit). 2 

Example 1.22 This example concerns electrical networks. Such a network (see 
Fig. 1.6) consists of conductors, each of which we shall consider to be uniform, 
connected together at points called nodes. At one point in the network, a direct cur- 


2 Such a proof was given by Lyusternik, and both the proof and the argument we hâve given here 
are taken from I.G. Petrovsky’s book Lectures on Partial Differential Equations , Dover Books on 
Mathematics, 1992. 
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Fig. 1.7 Decomposable 
network 


a 



e 



rent i enters, while at another point, current j exits. A uniform current flows due to 
the homogeneity of each conductor. 

We shall designate the conductors by the Greek letters a, / 3 , y, . . . , and the 
strength of the current in conductor a by i a . Knowing the current i , we would like 
to find the currents i a , ip, i y , . . . for ail the conductors in the network a, P 
and the current j . We shall dénoté the nodes of the network by a, b, c, 

We need to make one additional refinement here. Since the current in a conductor 
flows in a particular direction, it makes sense to indicate the direction with a sign. 
This choice is arbitrary for each conductor, and we designate the direction by an 
arrow. The nodes joined by a conductor are called its beginning and end , and the 
arrow points from the beginning of the conductor to the end. The beginning of the 
conductor a will be denoted by a\ and the end will be denoted by a" . The current 
i a will be considered positive if it flows in the direction of the arrow, and will be 
considered négative otherwise. We shall say that the current i a flows ont of node 
a (flows into node a) if there is a conductor a with beginning (end) node a. For 
example, in Fig. 1 . 6 , the current i a flows out of a and flows into b\ thus according 
to our notation, a' — a and a" = b. 

We shall assume further that the network in question satisfies the following nat- 
ural condition: Two arbitrary nodes a and b can be connected by some set of nodes 
c i , . . . , c n in such a way that each of the pairs a, c\ ; c \ , C2\ ... ; c n -\ , c n ; c n , b are 
connected by a conductor. We shall call this property of the network connectedness. 
A network not satisfying this condition can be decomposed into a number of subnet- 
works each of whose nodes are not connected to any nodes of any other subnetwork 
(Fig. 1 . 7 ). We may then consider each subnetwork individually. 

A collection of nodes a\ , . . . , a n connecting conductors ai , . . . , a n such that con- 
ductor a\ connects node a\ and a2, conductor a 2 connects nodes 02 and <23, 
conductor a n -\ connects nodes a n -\ and a n , and conductor a n connects nodes a n 
and a\ is called a closed circuit. For example, in Fig. 1 . 6 , it is possible to select as 
a closed circuit nodes a, b, c,d, h and conductors a, fi, y, 77, or else, for example, 
nodes e, g, h, d and conductors /z, ü, £, 8 . The distribution of current in the closed 
circuit is determined by two well-known laws of physics: Kirchhoff’s laws. 

Kirchhoff’s first law applies to each node of a network and asserts that the sum 
of the currents flowing into a node is equal to the sum of the currents flowing out it. 
More precisely, the sum of the currents in the conductors that hâve node a at their 
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end is equal to the sum of the currents in the conductors for which node a is the 
beginning. This can be expressed by the following formula: 

£/ B - £,■„=(> (1.23) 

a'=a p"=a 

for every node a. For example, in Fig. 1.6, for the node e we obtain the équation 

h is ix ifi = 0* 

Kirchhoff’s second law applies to an arbitrary closed circuit consisting of con- 
ductors in a network. Namely, if the conductors a\ form a circuit C, then with a 
direction of such a circuit having been assigned, the law is expressed by the équa- 
tion 

^ ' ^Padai — (1-24) 

cqeC 

where p ai is the résistance of the conductor aj (which is always a positive num- 
ber, silice the conductors are homogeneous), and where the plus sign is taken if the 
selected direction of the conductor (indicated by an arrow) coincides with the direc- 
tion of the current in the circuit, and the minus sign is taken if it is opposite to the 
direction of the current. For example, for the closed circuit C with nodes e, g, h, d 
as shown in Fig. 1.6 and with the indicated direction of the circuit, Kirchhoff’s law 
gives the équation 

-PiJii + Pûiû - Pçiç + psis = 0. (1-25) 

We thereby obtain a System of linear équations in which the unknowns are 
ia,ip,iy, • • • and j . Such a System of équations is encountered in a number of prob- 
lems, such as the allocation of loads in a transport network and the distribution of 
water is a System of conduits. 

Our goal is now to show that the System of équations thus obtained (for the given 
network and currents i) has a unique solution. 

First, we observe that the outflowing current j is equal to i . This is obvious from 
physical considérations, but we must dérivé it from the équations of Kirchhoff’s 
law. To this end, let us collect ail équations (1.23) for Kirchhoff’s first law for ail 
nodes a of our network. How often do we encounter conductor a in the obtained 
équation? We encounter it once when we examine the équation corresponding to the 
node a = cc\ and another time for a — a" . Furthermore, the current i a enters into 
the two équations with opposite signs, which means that they cancel. Ail that will 
remain in the resulting équation is the current i (for the point into which the current 
flows) and — j (for the point where the current flows out). This yields the équation 
i — j — 0 , that is, i — j . 

Now let us note that not ail the équations (1.24) corresponding to Kirchhoff’s 
second law are independent. We shall call a closed circuit a\ , ,a n a cell if ev- 
ery pair of its nodes is connected only by a conductor from among oq , . . . , a n and 
by no others. Every closed circuit can be decomposed into a number of cells. For 
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Fig. 1.8 Circuits for the 
proof of Euler’s theorem 


a 



(a) 


d 



example, in Fig. 1.6, the circuit C with nodes e, g, h, d and conductors //,#,§, 8 
can be decomposed into two cells: one with nodes e, g, h and conductors /z, #, À, 
and the other with nodes e, h, d and conductors À, §, <5. In this case, équation (1.24) 
corresponding to the circuit is the sum of the équations corresponding to the individ- 
ual cells (with a proper choice of directions for the circuits). For example, équation 
(1.25) for the circuit C with nodes e, g, h, d is the sum of équations 

-Piiifi + Pûh + Pxh = 0, -pï.i/, - pçiç + psh = 0, 

corresponding to the cells with nodes e, g, h and e, h, d. 

Thus, we can restrict our attention to équations of the cells of the network. Let us 
prove, then, that in the entire System of équations (1.23) and (1.24) corresponding 
to Kirchhoff’s first and second laws, the number of équations will be equal to the 
number of unknowns. We shall dénoté by N ce \\, N con( \ , and N no( \ e the numbers of 
cells, conductors, and nodes of the network. The number of unknowns i a and j is 
equal to Af con d + 1 • Each cell and each node contributes one équation. This means 
that the number of équations is equal to N ce \\ + Af no d e , and we need to prove the 
equality 

A^cell + A^ noc i e = A^ CO nd H - 1* (1.26) 

This is a familial' equality. It cornes from topology and is known as Euler’ s theorem. 
It is very easy to prove, as we shall now demonstrate. 

Let us make the important observation that our network is located in the plane: 
the conductors do not hâve to be straight line segments, but they are required to 
be nonintersecting curves in the plane. We shall use induction on the number of 
cells. Let us delete the “outer” side of one of the “external” cells (for example, side 
(/?, c, d) in Lig. 1.8(a)). In this case, the number of cells N ct \\ is reduced by 1. 

If in the “deleted” side there were k conductors, then the number Af con d will de- 
crease by k, while the number Af no de will decrease by k — 1. Altogether, the number 
A^ceii — Af C ond + N n0 d e — 1 does not change. In this process, the property of con- 
nectedness is not destroyed. Indeed, any two nodes of the initial network can be 
connected by the sequence of nodes ci, . . . , c n . If even part of this sequence con- 
sisted of vertices of the “deleted” sides of our cell, then we could replace them with 
the sequence of nodes of its “nondeleted” sides. 
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Fig. 1.9 Closed circuit 
containing nodes x and t 


z l 



This process reduces the proof to the case 7V ce ii = 0, that is, to a network that 
does not contain a closed circuit. We now must prove that for such a network, 
A^node — A^ conc i = 1. We now use induction on the number N con( \. Let us remove 
any “external” conductor at least one end of which is not the end of another con- 
ductor (for example, the conductor a in Fig. 1.8(b)). Then both numbers A^ con d and 
A^node are reduced by 1, and the number A con d — Af no de remains unchanged. We may 
easily convince ourselves that in this case, the property of connectedness is again 
preserved. As a resuit, we arrive at the case N con( \ = 0 but A noc je > 0. Silice the net- 
work must be connected, we hâve A no de = 1 , and it is clear that we hâve the equality 
A^node Af C ond — 1* 

We now note an important property of networks satisfying relationship (1.24) 
that emerges from Kirchhoff’s second law (for given currents i a ). With each node a 
one can associate a number r a such that for an arbitrary conductor a beginning at a 
and ending at Z?, the following équation is satisfied: 

P a 4ar = r a • (1-27) 

To détermine these numbers we shall choose some node x and assign to it the 
number r x arbitrarily. Then for each node y connected to x by some conductor a , 
we set r y — r x — p a i a if x is at the beginning of a and y at the end, and r y = 
r x + p a i a in the opposite case. Then in exactly the same way, we détermine the 
number r z for each node connected by a conductor to one of the examined nodes 
x, y, etc. In view of the connectedness condition, we will eventually reach every 
node t of our network, to which we will hâve assigned, say, the number r t . But it 
is still necessary to show that this number r t is independent of the path by which 
we arrive from x to t (that is, which point we chose as y, then as z, and so on). To 
accomplish this, it suffices to note that a pair of distinct paths linking nodes x and 
t forms a closed circuit (Fig. 1.9), and the relationship that we require follows from 
Kirchhoff’s second law (équations (1.24)). 

It is now easy to show that the System of linear équations (1.23) obtained from 
Kirchhoff’s first law for ail nodes and from Kirchhoff’s second law (1.24) for ail 
cells has a unique solution. To do so, it suffices, as we know, to show that the asso- 
ciated homogeneous System has only the null solution. This homogeneous System 
is obtained for i — j — 0 . 

Of course, “physically,” it is completely obvious that if we put no current into the 
network, then there will be no current in its conductors, but we must prove that this 
follows in particular from Kirchhoff’s laws. 
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To this end, consider the sum where the sum is over ail conductors 

of our network. Let us break the terni p a i % into two factors: p a i % — ( p a i a ) • i a . 
We replace the first factor by r a — r/, on the basis of relation (1.27), where a is 
the beginning and b the end of conductor a. We obtain the sum ^2 a (r a — rt,)i a , 
and we collect the terms in which the first factor r a or — is associated with a 
particular node c. Then we can pull the number r c outside the parenthèses, and 
inside will remain the sum ^2 a / =c i a — ^2p" =c which is equal to zéro on account 

of Kirchhoff’s first law (1.23). We finally obtain that p a i % = 0, and silice the 

résistance p a is positive, ail the currents i a must be equal to zéro. 

To conclude, we remark that networks appearing in mathematics are called 
graphs , and “conductors” become the edges of the graph. In the case that every 
edge of a graph is assigned a direction (provided with arrows, for example), the 
graph is then said to be directed. This theorem holds not for arbitrary graphs, but 
only for those, like the networks that we hâve considered in this example, that can 
be drawn in the plane without intersections of edges (for which we omit a précisé 
définition). Such graphs are called planar. 


Chapter 2 

Matrices and Déterminants 


2.1 Déterminants of Orders 2 and 3 

We begin by considering a System of two équations in two unknowns: 

I <211*1 +<212*2 = b\, 

( 321*1 +< 222*2 — ^ 2 * 


In order to détermine x \ , we attempt to eliminate X2 from the System. To accomplish 
this, it suffices to multiply the first équation by C122 and add to it the second équation 
multiplied by —<212. We obtain 

(<2i 1 <222 — <22 1 <2 12)* 1 = b\Cl22 — ^2<212- 


We consider the case in which <211(322 — <321 <3 12 7^ 0. Then we obtain 


b\ü22 — b2 a \2 

X\ = . 

<3n<222 — <^21 <212 


( 2 . 1 ) 


Analogously, to find the value X2, we multiply the second équation by an and add 
to it the first multiplied by —<221. With the same assumption (<211(322 — <221 <212 7^ 0), 
we obtain 


Z?2<2n — b 1 <221 
*2 = • 

< 2 ll <222 — <22 1 <2 1 2 


( 2 . 2 ) 


The expression <21 \ci22 — <212^21 appearing in the denominator of formulas (2.1) 
and (2.2) is called the déterminant of the matrix (^| ^ ) (it is called a déterminant 


of order 2, or a 2 x 2 déterminant) and is denoted by 
by définition, 


«21 a 22 > 

«11 «12 
«21 «22 


. Therefore, we hâve 


<211 

<221 


<212 

<222 


= <311(322 — <221<3l2- 


(2.3) 


I.R. Shafarevich, A. O. Remizov, Linear Algebra and Geometry, 
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Fig. 2.1 Calculating (a) the 
area of a triangle and (b) the 
volume of a tetrahedron 




B 


We see that in the numerators of formulas (2.1) and (2.2) there also appears an 
expression of the form (2.3). Using the notation we hâve introduced, we can rewrite 
these formulas in the following form: 



b\ «12 


«il b 1 

£>2 «22 

V/N 

«21 b2 

«il «12 

, x 2 — 

«il «12 

«21 «22 


«21 «22 


(2.4) 


The expression (2.3) is useful for more than a symmetric way of writing solutions 
of two équations in two unknowns. It is encountered in a great number of situations, 
and therefore has a spécial name and notation. For example, consider two points A 
and B in the plane with respective coordinates (jci, y\) and (x 2 , yi)\ see Fig. 2.1(a). 
It is not difficult to see that the area of triangle O AB is equal to (x\ y 2 — y\X2)/2. For 
example, we could subtract from the area of triangle O BD the area of the rectangle 
AC DE and the areas of triangles ABC and OAE. We thereby obtain 

A OAB = - Xl y ' . 

2 x 2 y 2 

Having in hand formulas for solutions of Systems of two équations in two un- 
knowns, we can solve some other Systems. Consider, for example, the following 
homogeneous System of linear équations in three unknowns: 


1 011*1 + 012*2 + 013*3 = 0, 
021*1 + 022*2 + 023*3 = 0. 


We are interested in nonnull solutions of this System, that is, solutions in which at 
least one Xj is not equal to zéro. Suppose, for example, that *3 7 ^ 0. Dividing both 
sides by —X 3 and setting —xi/x?, — y 1 , — *2 /*3 = y 2 , we can write System (2.5) in 
the form 

I auyi +012J2 = 013, 

021 yi + 022^2 =023, 
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which is in a form we hâve considered. If 
expressions 


a il «12 
«21 «22 


7 ^ 0, then formula (2.4) gives the 


*1 

y\ = — 

*3 


«13 «12 

*2 

«Il «13 

«23 «22 

«21 «23 

«11 «12 

» >2 — — 
*3 

«11 «12 

«21 «22 

«21 «22 


Unsurprisingly, we determined from System (2.5) not *i,* 2 ,* 3 , but only their 
mutual relationships: from such a homogeneous System, it easily follows that if 
(<?i , C 2 , C 3 ) is a solution and p is an arbitrary number, then ( pc \ , pc 2 , pc 3 ) is also a 
solution. Therefore, we can set 



«13 

«23 


«12 

«22 



«11 

«21 


«13 

«23 


*3 = 


«11 

«21 


«12 

«22 


( 2 . 6 ) 


and say that an arbitrary solution is obtained from this one by multiplying ail the x\ 
by p . In order to give our solution a somewhat more symmetric form, we observe 
that we always hâve 


a b 


b a 

c d 


d c 


This is easily checked with the help of formula (2.3). Therefore, (2.6) can be written 
in the form 


*1 = 


«12 

«22 


«13 

«23 



«11 

«21 


«13 

«23 


*3 = 


«11 

«21 


«12 

«22 


(2.7) 


Formulas (2.7) give values for x \ , X 2 , *3 if we cross out in turn the first, second, and 
third columns and then take the obtained second-order déterminants with alternating 
signs. We recall that these formulas were obtained on the assumption that 


«11 

«21 


«12 

«22 



It is easy to check that the assertion we hâve proved is valid if at least one of the three 
déterminants appearing in (2.7) is not equal to zéro. If ail three déterminants are 
zéro, then, of course, formula (2.7) again gives a solution, namely the null solution, 
but now we can no longer assert that ail solutions are obtained by multiplying by a 
number (indeed, this is not true). 

Let us now consider the case of a System of three équations in three unknowns: 

a\[X\ + < 212*2 + « 13 X 3 = b 1 , 

«21*1 + «22*2 + «23*3 = b2 , 

«31*1 + «32*2 + «33*3 = &3- 

. 


We again would like to eliminate X2 and X 3 from the System in order to obtain a 
value for x\. To this end, we multiply the first équation by ci, the second by C 2 , 
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and the third by C3 and add them. We shall therefore choose c 1, c 2 , and C3 such that 
in the System obtained, the ternis with x 2 and x 3 become equal to zéro. Setting the 
associated coefficients to zéro, we obtain for ci, c 2 , and C3 the following System of 
équations: 

| «12«l + «22«2 + «32«3 = 0, 

«I3«l + «23«2 4- «33«3 — 

This System is of the same type as (2.5). Therefore, we can use the formula (2.6) 
that we derived and take 


ci 


<322 <332 

<323 <333 


C 2 = - 


<312 

<313 


<332 

<333 


As a resuit, we obtain for x\ the équation 


«3 


<312 <322 

<313 <323 



<332 

<333 


— <321 


<312 

<313 



<322 

<332 


<323 

<333 


-b 2 


<312 

<313 


«32 

«33 


+ «31 


«32 

«33 


+ 


«12 

«13 

«22 

«23 

«12 

«13 

«22 

«23 


X\ 


(2.8) 


The coefficient of jci in (2.8) is called the déterminant of the matrix 


and is denoted by 


Therefore, by définition, 


«11 

«12 

«13 

«21 

«22 

«23 

«31 

«32 

«33 


«11 

«12 

«13 

«21 

«22 

«23 

«31 

«32 

«33 


«11 

«12 

«13 

«21 

«22 

«23 

«31 

«32 

«33 


= «11 


«22 

«32 


«23 

«33 


— «21 


«12 

«32 


«13 

«33 


+ «31 


«12 

«22 


«13 

«23 


(2.9) 


It is clear that the right-hand side of équation (2.8) is obtained from the coefficient 
of x\ by substituting an for bj, i = 1, 2, 3. Therefore, equality (2.8) can be written 
in the form 


«11 

«12 

«13 


&1 

«12 

«13 

«21 

«22 

«23 

Xi = 

b 2 

«22 

«23 

«31 

«32 

«33 


£3 

«32 

«33 
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We shall assume that the coefficient of x\, that is, the déterminant (2.9), is different 
from zéro. Then we hâve 


x { 


b\ a\2 «13 
b 2 «22 «23 
b 3 «32 «33 
«11 «12 «13 
«21 «22 «23 
«31 «32 «33 


( 2 . 10 ) 


We can easily carry out the same calculations for x 2 and X 3 . We obtain then the 
formulas 


*2 = 


«Il b 1 «13 
«21 ^2 «23 
«31 «33 



«il «12 b\ 
«21 «22 ^2 
«31 «32 ^3 

«11 «12 «13 

9 -*3 — 

«11 «12 «13 

«21 «22 «23 


«21 «22 «23 

«31 «32 «33 


«31 «32 «33 


Just as second-order déterminants express area, third-order déterminants enter 
into a number of formulas for volume. For example, the volume of a tetrahedron 
with vertices at the points O (the coordinate origin) and A, B, C with coordinates 
(* 1 , y u zi), (xi, yi, z 2 ), (X 3 , y 3 , zi) (see Fig. 2.1 (b)), is equal to 


1 

6 


x\ 

yi 

zi 

x 2 

y2 

Z2 

X 3 

Y3 

Z3 


This shows that the notion of déterminant that we hâve introduced is encountered 
in a number of branches of mathematics. We now return to the problem of solving 
Systems of n linear équations in n unknowns. 

It is clear that we can apply the same line of reasoning to a System consisting of 
four équations in four unknowns. To do so, we need to dérivé formulas analogous to 
( 2 . 7 ) for the solution of a homogeneous System of three équations in four unknowns 
based on formula ( 2 . 9 ). Then to eliminate *2, *3, M in a System of four équations in 
four unknowns, we multiply the équations by the coefficients c\ , C2, C3, c\ and add. 
The coefficients c\ , C2, C3, c\ will satisfy a homogeneous System of three équations, 
which we are able to solve. This will give us uniquely solvable linear équations in 
the unknowns x \ , . . . , X4 (as in the previous cases with two and three variables, the 
idea is the same for any number of unknowns). We call the coefficient of the un- 
knowns a fourth-order déterminant. Solving the linear équations thus obtained, we 
arrive at formulas expressing the values of the unknowns x \, . . . , *4, analogous to 
formula ( 2 . 10 ). Thus it is possible to obtain solutions to Systems with an arbitrarily 
large number of équations and with the same number of unknowns. 

To dérivé a formula for the solution of n équations in n unknowns, we hâve to 
introduce the notion of the déterminant of the n x n square matrix 

^Cl\{ Æ12 “• a \ 

Ü2\ Ü22 ‘ * Cl2n 

• • . 

\&n\ tt n 2 ' * ‘ ttfin / 


( 2 . 11 ) 
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that is, a déterminant of order n. 

Our previous analysis suggests that we define the n x n déterminant by induction: 
For n = 1 , we consider the déterminant of the matrix (a\ \ ) to be equal to the number 
a n, and assuming that the déterminant of order n — 1 has been defined, we proceed 
to define the déterminant of order n . 

Formulas (2.3) and (2.9) suggest how this should be done. In both formulas, 
the déterminant of order n (that is, two or three) was expressed in the form of an 
algebraic sum of éléments of the first column of matrix (2.11) (that is, of éléments 
a n, « 21 » • • • , ci, ii) multiplied by déterminants of order n — 1. The déterminant of 
order n — 1 by which a given element of the first column was multiplied was obtained 
by deleting from the original matrix the first column and the row in which the given 
element was located. Then the n products were added with alternating signs. 

We shall give a general définition of an n x n déterminant in the following sec- 
tion. The sole purpose of the discussion above was to make such a définition intel- 
ligible. The formulas introduced in this section will not be used again in this book. 
Indeed, they will be corollaries of formulas that we shall dérivé for déterminants of 
arbitrary order. 


2.2 Déterminants of Arbitrary Order 

A déterminant of the square n x n matrix 


/ai i 

<312 

a\ n \ 

<321 

<322 

* * ' Cl 2n 

yCln 1 

Cln2 

Ctnn 


is a number associated with the given matrix. It is defined inductively on the num- 
ber n. For n — 1, the déterminant of the matrix (an) is simply the number a\n 
Suppose that we know how to compute the déterminant of an arbitrary matrix of 
order (n — 1). We then define the déterminant of a square matrix A as the product 

| A \ — a\iD\ — «21 ^>2 + «31^3 — ü4\D^ H h (— \) n+[ a n {D n , (2.12) 

where Dk is the déterminant of order (n — 1) obtained from the matrix A by deleting 
the first column and the kth row. (The reader should verify that for n = 2 and n — 3 
we obtain the same formulas for déterminants of order 2 and 3 presented in the 
previous section.) 

Let us now introduce some useful notation and terminology. The déterminant of 
the matrix A is denoted by 


<311 

<312 

ain 

<321 

<322 

‘ ‘ Cl2n 

<3/z 1 

<3«2 

■ • 

<3/7/2 
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or simply by |A|, for short. If we delete the / th row and the yth column of the 
matrix A and preserve the ordering of the remaining éléments, then we end up with 
a matrix of order ( n — 1). Its déterminant is denoted by M[j and is called a minor 
of the matrix A, or more precisely, the minor associated with the element a/j. With 
this notation, (2.12) can be written in the form 


|A| = a\\M\\ — « 21^21 + «31^31 b (— \) n+{ a n iM n \. (2.13) 

This formula can be expressed in words thus: The déterminant ofannxn matrix is 
equal to the sum of the éléments of the first column each multiplied by its associated 
minor, where the sum is taken with alternating signs, beginning with plus. 


Example 2.1 Suppose a particular square matrix A of order n has the property that 
ail of its éléments in the first column are equal to zéro except for the element in the 
first row. That is, 


^«11 

«12 

•** « 1/7 ^ 

0 

«22 

• • * « 2/7 

• • 

\° 

««2 

• • 

«77 77 / 


Then in (2.13), ail the terms except the first are equal to zéro. Then formula (2.13) 
gives the equality 


where the matrix 


I A | = a ii 



•> 



(2.14) 


is of order n — 1 . 


There is a useful generalization of (2.14) that we shall now prove. 

Theorem 2.2 We hâve the following formula for the déterminant of a square matrix 
A of order n + m for which every element in the intersection of the first n columns 
and last m row s is zéro : 


|A| = 


«11 

• « 1/7 

«177 + 1 ‘ 

• • 

« 1 77 + 777 

«77 1 

«7777 

« 7777+1 

« 7777+777 

0 • 

• 0 

^11 

• • 

b 1 777 

0 • 

• 0 

^777 1 

^777777 
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a\\ a\ n 

• • • 


^11 * * ' b\ m 

a n i • • • a n n 


b ml ’ ^ mm 


(2.15) 


P roof We again make use of the définition of a déterminant, namely formula (2.13), 
now of order n + m, and we again employ induction on n. In our case, the last m 
terms of (2.13) are equal to zéro, and so we obtain 

|A| = a\\M\\ — ü 2 \M 2 \ + «31 A /31 h (— \) n+{ a n \M n \. (2.16) 

It is now clear that Mu is a déterminant of the same type as A, but of order n — 
1 + m. Therefore, by the induction hypothesis, we can apply the theorem to this 
déterminant, obtaining 



(2.17) 


where Mu has the same meaning as in (2.13) for the déterminant |A|. Substituting 
expressions (2.17) into (2.16) and using (2.13) for |A|, we obtain relation (2.15). 
The theorem is proved. □ 


Remark 2.3 One may well ask why in our définition the first column played a spé- 
cial rôle and what sort of expressions we might obtain were we to formulate the 
définition in terms not of the first column, but of the second, third, . . . , column. As 
we shall see, the expression obtained will differ from the déterminant by at most a 
sign. 


Now let us consider some of the basic properties of déterminants. Later on, we 
shall see that in the theory of déterminants, just as in the theory of Systems of linear 
équations, an important rôle is played by elementary row operations. Let us note 
that elementary operations like those of type I and type II can be applied to the rows 
of a matrix whether or not it is the matrix of a System of équations. Theorem 1.15 
shows that an arbitrary matrix can be transformed into échelon and triangular form. 

Therefore, it will be useful to figure out how elementary operations on the rows of 
a matrix affect the matrix’s déterminant. In connection with this, we shall introduce 
some spécial notation for the rows of a matrix A: We shall dénoté by a { the i th row 
of A, i = 1, . . . , n. Thus 

Clj = ( üj i , üj 2, • • • , dinf 

We shall prove several important properties of déterminants. We shall prove Proper- 
ties 2.4, 2.6, and 2.7 below by induction on the order n of the déterminant. For n — 1 
(or for Property 2.6, for n — 2), these properties are obvious, and we shall omit a 
proof. We can therefore assume in the proof that the properties hâve been proved for 
déterminants of order n — 1 . 
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By définition (2.13), a déterminant is a function that assigns to the matrix A a 
certain number \A\. We shall now assume that ail the rows of the matrix A except 
for one, let us say the ith, are fixed, and we shall explain how the déterminant 
dépends on the éléments of the / th row a x . 

Property 2.4 The déterminant of a matrix is a linear function of the éléments of an 
arbitrary row of the matrix. 

P roof Let us suppose that we wish to prove this property for the i th row of matrix A. 
We shall use formula (2. 13) and show that every term in it is a linear function of the 
éléments of the i th row. For this, it suffices to choose numbers d\j , d, 2 j , . . . , d n j such 
that 


AzClj 1 Mj 1 — d\ jü[\ + ^2 ] a i2 H V d n jai n 

for ail j = 1, 2, . . . , n (see the définition of linear function on p. 2). We begin with 
the term ±a z \Mj[. Since the minor M / 1 does not dépend on the éléments of the i th 
row — the i th row is ignored in the calculation — it is simply a constant as a function 
of the i th row. Let us set du — d=M / 1 and d 2 i — dy = • • • = d nl — 0. Then the first 
term is represented in the required form, and indeed is a linear function of the i th 
row of the matrix A. For the term i Mj i , for j /, the element aj\ does not 
appear in the i th row, but ail the éléments of the i th row of matrix A other than ai \ 
appear in some row of the minor Mj\. Therefore, by the induction hypothesis, Mj\ 
is a linear function of these éléments, that is, 

Mj\ = d' 2 jüi2 + b d' n jCii n 

for some numbers dy , . . . , dP . Setting d 2 j = a j\dy, ..., d n j — a j \ d ' n - , and 
d\j — 0, we convince ourselves that aj\M j\ is a linear function of the i th row of 
matrix A, but this means that such is also the case for the function dbfl 7 - 1 M j \ . There- 
fore, | A | is the sum of linear functions of the éléments of the i th row, and it follows 
that | A | is itself a linear function (see p. 4). □ 

Corollary 2.5 If we apply Theorem 1.3 to a déterminant as a function of it s ith 
row / then we obtain the follow in g: 

1. Multiplication ofeach ofthe éléments ofthe ith row of a matrix A by the number 
p multiplies the déterminant \A\by the same number. 

2. If ail éléments ofthe ith row of matrix A are ofthe form aij —bj + Cj, then its 
déterminant \A\ is equal to the sum ofthe déterminants oftwo matrices , in each 
of which ail the éléments other than the éléments in the i th row are the same as 
in the original , and in the ith row ofthe first déterminant , instead ofthe éléments 


1 We are being a bit sloppy with language here. We hâve defined the déterminant as a function that 
assigns a number to a matrix, so when we speak of the “rows of a déterminant,” this is shorthand 
for the rows of the underlying matrix. 
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ciij, one has the numbers b j, xvhile in the ith row ofthe other one, îhe numbers 
are Cj. 

Property 2.6 The transposition of two rows of a déterminant changes its sign. 

P roof We again begin with formula (2.13). Let us assume that we hâve interchanged 
the positions of rows j and j + 1 . We first consider the term a ,• 1 M, i , where i j 

and i j + 1. Then interchanging the j th and (j + l)st rows does not affect the 
éléments an. As for the minor M/i, it contains the éléments of both the yth and 
(j + l)st rows of the original matrix (other than the first element of each row), 
where they again fill two neighboring rows. Therefore, by the induction hypothesis, 
the minor Mn changes sign when the rows are transposed. Thus every term anMn 
with i ^ j and i ^ j + 1 changes sign with a transposition of the y th and (j + 1) st 
rows. The remaining terms hâve the form 

(-iy +l a j iM jl + (-1 y +2 a j+ uM j+ u 
= {-l) j+ \a n M jX - aj+uMj+u). (2.18) 

With a transposition of the j th and (j + l)st rows, it is easily seen that the terms 
aj\M j\ and a/+iiMy+ii exchange places, which means that the entire expression 
(2.18) changes sign. This proves Property 2.6. □ 


In what follows, a prominent rôle will be played by the square matrices 


/I 0 ••• 0\ 

0 1 ••• 0 


\0 0 • • • 1 / 


(2.19) 


ail of whose éléments on the main diagonal are equal to 1 and ail of whose nondi- 
agonal éléments are equal to zéro. Such a matrix E is called an identity matrix. Of 
course, for every natural number n there exists an identity matrix of order n, and 
when we wish to emphasize the order of the identity matrix under considération, we 
shall write E n . 


Property 2 . 7 The déterminant of the identity matrix E n , for ail n > 1 , is equal to 1 . 

P roof In formula (2.13), an = 0 if / 1, and a\\ — 1. Therefore, |£| = M\\. The 

déterminant M\\ has the same structure as |2s|, but its order is n — 1. By the induc- 
tion hypothesis, we may assume that \ — 1, which means that |£| = 1. □ 

In proving Properties 2.4, 2.6, and 2.7, it was necessary to use définition (2.13). 
Now we shall prove a sériés of properties of the déterminant that can be formally 
derived from these first three properties. 
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Property 2.8 If ail the éléments of a row of a matrix are equal to 0, then the déter- 
minant of the matrix is equal to 0. 

P roof Let an — an — • ■ ■ — aj n — 0. We may set an — pbn , where p — 0, bn 0, 
k — 1, . . . , n, and apply the first assertion of Corollary 2.5. We obtain that \A\ — 
p\A'\, where \A'\ is some other déterminant and the number p is equal to zéro. We 
conclude that \A\ = 0. □ 

Property 2.9 If we transpose any two (not necessarily adjacent) rows of a détermi- 
nant, then the déterminant changes sign. 

P roof Let us transpose the / th and y th rows, where i < y. The same resuit can be 
achieved by successively transposing adjacent rows. Namely, we begin by transpos- 
ing the /th and (/ + l)st rows, then the (/ + l)st and (/ -b 2)nd, and so on until 
the /th row has been moved adjacent to the y th row, that is, into the (y — l)st 
position. At this point, we hâve carried out y — i — 1 transpositions of adjacent 
rows. Then we transpose the (y — l)st and y th rows, thereby increasing the num- 
ber of transpositions to y — i. We then transpose the y th row with its succes- 
sive neighbors so that it occupies the /th position. In the end, we will hâve ex- 
changed the positions of the /th and yth rows, with ail other rows occupying their 
original positions. In carrying out this process, we hâve transposed adjacent rows 
(/ — y — 1) + 1 + (/ — y — 1) = 2 (/ — y — 1) + 1 times. This is an odd number. 
Therefore, by Property 2.6, which asserts that interchanging two rows of a matrix 
results in a change of sign in the déterminant, the resuit of ail transpositions in this 
process is a change in the determinant’s sign. □ 

Property 2.9 can also be stated thus: An elementary operation of type I on the 
rows of a déterminant changes its sign. 

Property 2.10 If two rows of a matrix A are equal, then the déterminant | A\ is equal 
to zéro. 

P roof Let us transpose the two equal rows of A. Then obviously, the déterminant 
\A\ does not change. But by Property 2.9, the déterminant changes sign. But then 
we hâve \A\ = — \ A | , that is, 2| A | =0, from which we may conclude that \A\ =0. □ 

Property 2.11 If an elementary operation of type II is performed on a déterminant, 
it is unchanged. 

P roof Suppose that after adding c times the yth row of A to the /th row, we hâve 
the déterminant A'. Its /th row is the sum of two rows, and by the second assertion 
of Corollary 2.5, we hâve the equality \A'\ — D\ + £> 2 , where D\ — \A\. As for 
the déterminant £> 2 , it differs from \A\ in that in the /th row, it has c times the 
yth row. The factor c can be taken outside the déterminant by the first assertion 
of Corollary 2.5. Then we hâve a déterminant whose /th and yth rows are equal. 
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But by Property 2.10, such a déterminant is equal to zéro. Hence D 2 = 0, and so 
\A'\ = |A|. □ 


We remark that the properties proven above give us a very simple method for 
computing a déterminant of order n . We hâve only to apply elementary operations 
to bring the matrix A into upper triangular form: 



^ 1/2 ^ 
&2n 

&nn ) 


Let us suppose that in the process of doing this, we hâve completed î elementary 
operations of type I and some number of operations of type II. Since operations 
of type II do not change the déterminant, and an operation of type I multiplies the 
déterminant by — 1, we hâve \A \ = (— l) r |A|. We shall now show that 


A| = 11^22 ' -ann- 


( 2 . 20 ) 


Then 


A| = (— l/tf 11*322 * * -ann- 


( 2 . 21 ) 


This is a formula for calculating | A|. 

We shall prove formula (2.20) by induction on n. Since in the matrix A, ail élé- 
ments of the first column except â\\ are equal to zéro, it follows by formula (2.14) 
that we hâve the equality 


in which the déterminant 


I A | =a n 



( 2 . 22 ) 


ü 22 a 23 ' * ‘ a 2n 

0 <233 • • • a^ n 

0 0 • • • a nn 


has a structure analogous to that of the déterminant | A| . By the induction hypothesis, 
we obtain the equality \ A \ = « 22^33 • * • d nn . Substituting this expression into (2.22) 
yields the formula (2.20) for |A|. 

The properties of déterminants that we hâve proved allow us to conclude an im- 
portant theorem on linear équations. 


Theorem 2.12 A System ofn équations in n unknowns has a unique solution if and 
only if the déterminant of the matrix of the System is different f rom zéro. 
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Proof We bring the System into triangular form: 

■ — 

Cl\[X\ + a 12X2 ~h -\- cil n x n = bli 

Ü22X2 + + â2n*n = ^ 2 , 


By Corollary 1.19, the System has a unique solution if and only if 

«11 7 ^ 0. «22 7 ^ 0 , • • • 5 Æ/î/î # 0. (2.23) 

On the other hand, the déterminant of the matrix of the System is the product 
^11^22 • • • d nn ■> and it follows that it is different from zéro if and only if ( 2 . 23 ) is 
satisfied. □ 

Corollary 2.13 A homogeneous System ofn équations in n unknoxvns has a nonzero 
solution if and only if the déterminant of the matrix of the System is equal to zéro. 

This resuit is an obvious conséquence of the theorem, since a homogeneous Sys- 
tem of équations always has at least one solution, namely the nuit solution. 

Définition 2.14 A square matrix whose déterminant is nonzero is said to be non- 
singular. Conversely, a matrix whose déterminant is equal to zéro is singular. 

In Sect. 2.1, we interpreted the déterminant of order two as the area of a triangle 
in the plane, while a 3 x 3 déterminant was viewed as the volume of a tetrahedron 
in three-dimensional space (with suitable coefficients). Clearly, the area of a trian- 
gle reduces to zéro only if it degenerates into a line segment, and the volume of a 
tetrahedron is zéro only if the tetrahedron degenerates into a planar figure. 

Such examples give an idea of the géométrie sense of the singularity of a matrix. 
The notion of singularity will become clearer in Sect. 2.10, when we introduce the 
notion of inverse matrix, and most importantly, in subséquent chapters when we 
consider linear transformations of vector spaces. 


2.3 Properties that Characterize Déterminants 

In the preceding section we said that the déterminant is a function that assigns to a 
square matrix a number, and we proved two important properties of the déterminant: 

1 . The déterminant is a linear function of the éléments in each row. 

2. Transposing two rows of a déterminant changes its sign. 

We shall now show that the déterminant is in fact completely characterized by these 
properties, as formulated in the following theorem. 
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Theorem 2.15 Let F (A) be a fonction that assigns to a square matrix A oforder n 
a certain number. If this fonction satisfies properties 1 and 2 above , then there exists 
a number k such that 


F(A) — k\A\. 


(2.24) 


In this case , the number k is equal to F (F ), where E is the identity matrix. 


F roof First of ail, we observe that from properties 1 and 2 it follows that the function 
F (A) is unchanged if we apply to the matrix A an elementary operation of type II, 
and that it changes sign if we apply an elementary operation of type I. This proves 
that from properties 1 and 2 above, we hâve the corresponding properties of the 
déterminant (Properties 2.9 and 2.11 of Sect. 2.2). 

Let us now bring matrix A into échelon form using elementary operations. We 
write the matrix thus obtained in the form 



(2.25) 


whereby we do not, however, assert that a\\ 0, . . . , d nn 0. Such a form can 

always be obtained, since for a square matrix in échelon form, ail éléments ajj , 
i > j, that is, those below the main diagonal, are equal to zéro. Let us assume that 
in the transition from A to A, we hâve performed t elementary operations of type I, 
while ail the other operations were of type II. Since under an elementary operation 
of type II neither F (A) nor \A\ is changed, and under elementary operations of 
type I, both expressions change sign, it follows that 


I A | = ( — l) r | A | , F (A) = (-l)'F(A). (2.26) 


In order to prove formula (2.24) in the general case, it now suffices to prove it for 
matrices A of the form (2.25), that is, to establish the equality F (A) = k\A\, which, 
in turn, clearly follows from the relationships 

|A| = dud22 • -arm, F (A) = F(E) • F 11 F 22 • • • â nn . (2.27) 

We observe that the first of these equalities is precisely the equality (2.20) from 
the previous section. Moreover, it is a conséquence of the second equality, since 
the déterminant |A|, as we hâve shown, is also a function of type F(A), possessing 
properties 1 and 2. And therefore, having proved the second equality in (2.27) for an 
arbitrary function F (A) possessing the given properties, we shall prove this again 
for the déterminant. 


2.4 Expansion of a Déterminant Along Its Columns 
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It thus remains only to prove the second equality of (2.27). In view of property 1, 
we can take out from F (A) the factor â nn : 


/ /«il 
0 


F(Â) = a nn • F 


« 12 
«22 


«î/îX \ 

«2/7 


\ 


V o o 


1 7 


7 


Let us now add to rows 1,2 — 1 the last row multiplied by the numbers 
— â \ n , —« 2 / 7 , • • • , —â n -\ n respectively. In this case, ail éléments, except the éléments 
of the last column, are unchanged, and ail the éléments of the last column become 
equal to zéro, with the exception of the nth, which remains equal to 1. Then let us 
apply analogous transformations to the matrix of smaller size with éléments located 
in the first n — 1 rows and columns, and so on. Each time, the number an is factored 
out of F, and the argument is repeated. After doing this n times, we obtain 


F (A) — a nn • • • «n • F 


( (\ 0 ••• 0 \\ 
0 1 ••• 0 

Wo 0 ... 1 JJ 


which is the second equality of (2.27). 



2.4 Expansion of a Déterminant Along Its Columns 

On the basis of Theorem 2.15, we can answer a question that arose earlier, in 
Sect. 2.2: does the first column play a spécial rôle in (2.12) and (2.13) for a dé- 
terminant of order ni To answer this question, let us form an expression analogous 
to (2.13), but taking instead of the first column, the j th column. In other words, let 
us consider the function 


F(A) = aijMij - a 2 jM 2 j + - - - + (-1 ) n+i a nJ M nJ . (2.28) 

It is clear that this function assigns to every matrix A of order n a spécifie number. 
Let us verify that it satisfies conditions 1 and 2 of the previous section. To this end, 
we hâve simply to examine the proofs of the properties from Sect. 2.2 and convince 
ourselves that we never used the fact that it was precisely the éléments of the first 
column that were multiplied by their respective minors. In other words, the proofs 
of these properties apply word for word to the function F(A). By Theorem 2.15, 
we hâve F (A) = k\A\, and we hâve only to détermine the number k in the formula 
k — F(F). 

For the matrix F, ail the éléments «;y are equal to zéro whenever i j, and 
the éléments «y, are equal to 1. Therefore, formula (2.28) reduces to the equality 
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F (a) — ±Mjj. Since in formula (2.28) the signs alternate, the term cijjMjj appears 
with the sign (— 1) /+1 . Clearly, Mjj is the déterminant of the identity matrix E of 
order n — 1, and therefore, Mjj — 1. As a resuit, we obtain that k = (— 1) /+1 , which 
means that 


axjMxj - a 2 jM 2 j + ■ ■ • + (-1 ) n+l a nj M nj = \A\. 

We now move the coefficient (— 1) /+1 to the left-hand side: 

|A| = {—\y +x a\jM\j + {-\y +2 a 2j M 2 j + • ■ • + (-1) i+n a nj M nj . (2.29) 

We see that the element aij is multiplied by the expression (— l) /+/ M/y, which is 
called its cofactor and denoted by A/,-. We hâve therefore obtained the following 
resuit. 

Theorem 2.16 The déterminant of a matrix A is equal to the sum of the éléments 
from any of its columns each multiplied by its associated cofactor : 


\A \ — a\jA\j + a 2 j A 2 j + • • • + a n jA n j. ( 2 . 30 ) 

In this statement, each column plays an identical rôle to that played by any other 
column. For the first column, it becomes the formula that defines the déterminant. 
Formulas (2.29) and (2.30) are called the expansion of the déterminant along the 
jth column. 

As an application of Theorem 2.16, we can obtain a whole sériés of new proper- 
ties of déterminants. 

Theorem 2.17 Properties 2.4, 2.6, 2.7, 2.8, 2.9, 2.10, 2.11 and ail their corollaries 
hold not onlyfor the rows of a déterminant , but for the columns as well. 

P roof If follows from formula (2.30) that the déterminant is a linear function of the 
éléments of the jth column, j = 1, . . . , n. Consequently, Property 2.4 holds for the 
columns. 

We shall prove Property 2.6 by induction on the order n of the déterminant. For 
n — 1, the assertion is empty. For n — 2, it can be checked using formula (2.3). Now 
let n > 2, and let us assume that we hâve transposed columns numbered k and k + 1 . 
We make use of formula (2.30) for j k,k + 1 . Then both the kth and the (k + l)st 
columns enter into every minor Mjj (/ = 1, ...,«). By the induction hypothesis, un- 
der a transposition of two columns, each minor will change sign, which means that 
the déterminant as a whole changes sign, which proves Property 2.6 for columns. We 
observe that in Property 2.7, the statement does not discuss rows or columns, and 
the remaining properties follow formally from the first three. Therefore, ail seven 
properties and their corollaries are valid for the columns of a déterminant. □ 
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In analogy to Theorem 2.15, from Theorem 2.17 it follows that any multilin- 
ear antisymmetric function - of the columns of a matrix must be proportional to 
the déterminant function of the matrix. Consequently, we hâve the analogue of for- 
mula (2.24), where the function F (A) satisfies properties 1 and 2, reformulated for 
columns. In this case, the value k , as can easily be seen, remains the same. In partic- 
ular, for an arbitrary index i = 1, . . . , n, we hâve the formula, analogous to (2.30), 

I A| = an Ail + Æ/2 A/2 + • • • + ai n Ai n . (2.31) 

It is called the expansion of the déterminant \A\ along the ith row. The formula 
for the column or row expansion of a déterminant has a broad generalization that 
goes under the name Laplace’s theorem. It consists in the fact that one has an anal- 
ogous expansion of a square matrix of order n not only along a single column (or 
row), but for an arbitrary number m of columns, 1 < m < n — 1. For this, it is nec- 
essary only to détermine the cofactor not of a single element, but of the minor of 
arbitrary order m. Laplace’s theorem can be proved, for example, by induction on 
the number m, but we shall not do this, but rather put off its précisé formulation and 
proof to Sect. 10.5 (p. 379), where it will be obtained as a spécial case of even more 
general concepts and results. 


Example 2.18 In Example 1.20 (p. 15), we proved that the problem of interpolation, 
that is, the search for a polynomial of degree n that passes through n + 1 given 
points, has a unique solution. Theorem 2.12 shows that the déterminant of the matrix 
of the corresponding linear System (1.20) is different from zéro. Now we can easily 
calculate this déterminant and once again verify this property. 

The déterminant of the matrix of System (1.20) for r — n + 1 has the form 


1 

1 



ci 

C2 



1 c n cl 
1 c n+ l C 2 n+l 





(2.32) 


It is called the Vandermonde déterminant of order n + 1 . We shall show that this 
déterminant is equal to the product of ail différences c\ — Cj for i > /, that is, that it 
can be written in the following form: 


iAi=nfa-c,). 

i>j 


(2.33) 


We shall prove (2.33) by induction on the number n. For n — 1, the resuit is obvious: 


1 

1 


c i 
C2 


— C2 — C\. 


2 


For the définition and a discussion of antisymmetric functions, see Sect. 2.6. 
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For the proof of the general case, we use the fact that the déterminant does not 
change under an elementary operation of type II (Property 2.11 from Sect. 2.2), and 
moreover, from Theorem 2.17, this property holds for columns as well as for rows. 
We will make use of this by subtracting the nth column multiplied by c\ from the 
(n + l)st, then the (n — l)st multiplied by c\ from the nth, and so on, ail the way 
to the second column, from which we subtract the first multiplied by c\. B y the 
indicated property, the déterminant does not change under these operations, but on 
the other hand, it assumes the form 


1 

1 

1 

1 


0 

0 

0 

C 2 - Cl 

C 2 (c 2 - Cl) 

c 2 ~ [ ( C 2 — C[) 

c n - Cl 

Cn (en c\ ) 

Cft — 1 (,C n Cl ) 

C/7— J— 1 Cl 

£«+l £l) 

C «+l(Cfl+l — ci) 


Making use of Theorem 2.17, we apply to the first row of the déterminant thus 
obtained (consisting of a single nonzero element) the analogue of formula (2.12). 
As a resuit, we obtain 



C 2 - Cl 
• 

C 2 (C 2 ~ Cl) 

• 

c 2 1 (C2 — Cl) 

• 

• 

Cn C\ 

• 

• 

Cn (.Ci 7 Ci) 

c n n -\c n -ci) 

Ci 7+1 C\ 

£77+1(^77+1 c \) 

••• £^ + J (<7?+l “ c\) 


To the last déterminant let us apply Corollary 2.5 of Sect. 2.2 and remove from 
each row its common factor. We obtain 


1 


C2 


C 


n — 1 
2 


I A\ = \A\ = (C 2 -C i) • • • (C n ~ C[)(c n+ [ - Cl) ‘ 


1 

1 


Cn 

£77 + 1 


.n— 1 
n 


c n ~ l 

C 77+l 


(2.34) 


The last déterminant is a Vandermonde déterminant of order n, and by the induction 
hypothesis, we can assume that formula (2.33) holds for it. Putting the expression 
(2.33) for a Vandermonde déterminant of order n into expression (2.34), we obtain 
the desired formula (2.33) for a Vandermonde déterminant of order n + 1. Silice 
we hâve assumed that ail the numbers ci, ... , c „ + 1 are distinct, the product of the 
différences c/ — c y for i > j must be different from zéro, and we obtain a new proof 
of the resuit that polynomial interpolation as described has a unique solution. 


2.5 Cramer ’s Rule 

We are now going to dérivé explicit formulas for the solution of a System of n 
équations in n unknowns, formulas for which we hâve developed the theory of de- 
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terminants. The matrix A of this System is a square matrix of order /?, and we shall 
assume that it is not singular. 

Lemma 2.19 The sum ofthe éléments a t j of an arbitrary ( here the jth) column of 
a déterminant each multiplied by the cofactor A/& corresponding to the éléments of 
any other column ( here the kth ) is equal to zéro : 


a l jA lk +a 2 jA 2 k H h a n jA nk = 0 fork^L j. 

P roof We replace the kth column in our déterminant \A\ with its yth column. As 
a resuit, we obtain a déterminant \A'\ that by Property 2.10 of Sect. 2.2, reformu- 
lated for columns, is equal to zéro. On the other hand, let us expand the déterminant 
\A'\ along the kth column. Since in forming the cofactors of this column, the élé- 
ments of the kth column cancel, we obtain the same cofactors A/* as in our original 
déterminant | A | . Therefore, we obtain 


A ' — aijA\k + a 2 jA 2 k H + a n jA n k — 0, 


which is what we wished to show. 



Theorem 2.20 (Cramer’s rule) If the déterminant of the matrix of a System of n 
équations in n unknowns is different from zéro , then its solution is given by 


x k = 




(2.35) 


where D is the déterminant ofthe matrix ofthe System , and Dk is obtained from D 
by replacing the kth column ofthe matrix with the column of constant terms. 


P roof By Theorem 2.12, we know that there is a unique collection of values for 
x \ , . . . , x n that transforms the System 

a\\X\ + * * * + Æl n x n — b\, 


a n i-*i + • • • + a nn x n — b n 

into the identity. Let us détermine the unknown Xk for a given k. 

To do so, we shall proceed exactly as in the case of Systems of two and three 
équations from Sect. 2.1: we multiply the / th équation by the cofactor A;& and then 
sum ail the resulting équations. After this, the coefficient of Xk will hâve the form 


Q\kA\k + h a n kA n k — D. 


The coefficient of Xj for j k will assume the form 

a \ jA\k H h a n jA n k. 
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By Lemma 2.19, this number is equal to zéro. Finally, for the constant term we 
obtain the expression 

b\A\k H + b n A nk . 

But it is precisely this expression that we obtain if we expand the déterminant D k 
along its kth column. Therefore, we arrive at the equality 

Dx k = D k , 

and since D ^ 0, we hâve x k = D k / D. This is formula (2.35). □ 


2.6 Permutations, Symmetric and Antisymmetric Functions 

A careful study of the properties of déterminants leads to a number of important 
mathematical concepts relating to arbitrary finite sets that in fact could hâve been 
presented earlier. 

Let us recall that in Sect. 1.1 we studied linear functions as functions of rows 
of length n . In Sect. 2.2 we looked at déterminants as functions of square ma- 
trices. If we are interested in the dependence of the déterminant on the rows of 
its underlying matrix, then it is possible to consider it as a function of its n rows: 
\A\ = F (a i, ü 2 , . . . , a n ), where for the matrix 



(au 

a 12 

a \ n ^ 

A = 

an 

ai2 

* • • a2n 


1 

a,i2 

a nn ) 

we dénoté by its i th row: 





= (a t 

\,aj2, 

• • ? a in) ■ 


Here we encounter the notion of a function F {a i , ai, . . . , a n ) of n éléments of a set 
M as a rule that assigns to any n éléments from M, taken in a particular order, some 
element of another set N. Thus, F is a mapping from M u to N (see p. xvii). In our 
case, M is the set of ail rows of fixed length n , and N is the set of ail numbers. 

Let us introduce some necessary notation for the sequel. Let M be a finite set 
consisting of n éléments a \ , « 2 , 

Définition 2.21 A function on the n éléments of a set M is said to be symmetric if 
it is unchanged under an arbitrary rearrangement of its arguments. 

After numbering the n éléments of the set M with the indices 1, 2, . . . , n, we can 
consider that we hâve arranged them in order of increasing index. A permutation of 
them can be considered a rearrangement in another order, which we shall write as 
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follows. Let j \ , /2, . . . , j n represent the same numbers 1 , 2 , . . . , n, but perhaps listed 
in a different order. In this case, we shall say that (y'i , y 2 , . . . , j n ) is a permutation 
of the numbers ( 1 , 2 , . . . ,n). Analogously, we shall say that (« 7 l , a y 2 , . . . , a j n ) is a 
permutation of the éléments (ci \ , ci2 , . . . , a n ). 

Thus the définition of a symmetric function can be written as the equality 

F{a jl ,aj 2 ,...,a jn ) = F (ai, a 2 ,.--,a n ) ( 2 . 36 ) 

for ail permutations ( j \ , y'2, . . . , j n ) of the numbers ( 1 , 2 , . . . , n). 

In order to détermine whether one is dealing with a symmetric function, it is not 
necessary to verify equality ( 2 . 36 ) for ail permutations (y'i , y 2 , . . . , j n ), but instead 
we can limit ourselves to certain permutations of the simplest form. 

Définition 2.22 A permutation of two éléments of the set (a \ , #2, • • • , «//) is called 
a transposition. 

A transposition under which the i th and yth éléments (that is, ai and a j) are 
transposed will be denoted by Xjj. Clearly, we may always assume that i < y. 

We hâve the following simple fact about permutations. 

Theorem 2.23 From any arrangement (i\, . . • ,i n ) of distinct natural num- 

bers taking values from 1 to n, it is possible to obtain an arbitra ry permutation 
(y'i , y’2, • • • , jn) by carrying out a certain number of transpositions. 

P roof We shall use induction on n. For n — 1 , the assertion of the theorem is a tau- 
tology: there exists only one permutation, and so it is unnecessary to introduce any 
transpositions at ail. In the general case (n > 1 ), let us suppose that y 1 stands at the 
kth position in the permutation (/ 1 , z'2, . . . , i n ), that is, y 1 = ik- We will perform the 
transposition x\^ on this permutation. If y 1 = i 1, then it is not necessary to perform 
any transposition at ail. We obtain the permutation (y'i , z'2, . . . , i \ , where y'i 

is in the first position, and i\ is in the kth position. Now we need to use transposi- 
tions to obtain from the permutation (yi , h, • • • , /1, • • • , in) the second permutation, 
(y'i , y’2, • • • , jn), given in the statement of the theorem. 

If we cancel y 1 from both permutations, then what remains is a permutation of 
the numbers a such that 1 < a < n and a / j\ . To these two permutations now 
consisting of only n — 1 numbers, we can apply the induction hypothesis and obtain 
the second permutation from the first. Beginning with the transposition we can 
thus obtain from the permutation (« 1 , «2 * • • • , in) the permutation (y 1, y'2, . . . , j n ). 
In some cases, it will not be necessary to apply a transposition (for example, if 
j\ — i 1). The limiting case can also be encountered in which it will not be necessary 
to use any transpositions at ail. It is easy to see that such occurs only for i 1 = y 1 , 
h — y‘2, • • • , in — jn - The assertion of the theorem is true in this case, but the set of 
transpositions used is empty. □ 

This very simple argument can be illustrated as follows. Let us suppose that at a 
concert, the invited guests sit down in the first row, but not in the order indicated on 
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the administrator’s guest list. How can he achieve the requisite ordering? Obviously, 
he may identify the guest who should be sitting in the first position and ask that 
person to change seats with the person sitting in the first chair. He will then do 
likewise with the guests who occupy the second, third, and so on, places, and in the 
end will hâve achieved the required order. 

It follows from Theorem 2.23 that in determining that a function is symmetric, 
it suffices to verify equality (2.36) for permutations obtained from the permutation 
(1, 2, . . . , n) by a single transposition, that is, to check that 

F (a i , ...,#/, . . . , ci j , • • • , ci n ) = F (a. i , . . . , ci j , . . . , üi , . . . , ü n ) 

for arbitrary a\, ... ,a n , i, and j . Indeed, if this property is satisfied, then applying 
various transpositions successively to the argument of the function F(a \, . . . , a n ), 
we will always obtain the same function, and by Theorem 2.23, we will finally 
obtain the function F (a j l , . . . , a j n ). 

For example, for n — 3, we hâve three transpositions: rp2, Tz,3, x\ ,3. For the 
function F (ci\, a2,af) — a\a^ F a\a^ -b 02^3, for example, under the transposition 
Ti 2, the term «1^2 remains unchanged, but the other two terms exchange places. The 
same sort of thing transpires for the other transpositions. Therefore, our function is 
symmetric. 

We now consider a class of functions that in a certain sense are the opposite of 
symmetric. 

Définition 2.24 A function on n éléments of a set M is said to be antisymmetric if 
under a transposition of its éléments it changes sign. 

In other words, 

F (ci 1 , . . . , cij , . . . , ci j , . . . , cif 7) = F (ci 1 , . . . , a j , . . . , #/, . . . , a n ) 
for any a\, . . . ,a n , i, and j . 

The notions of symmetric and antisymmetric function play an extremely impor- 
tant rôle in mathematics and mathematical physics. For example, in quantum me- 
chanics, the State of a certain physical quantity in a System consisting of n (generally 
a very large number) elementary particles p \, . . . , p n of a single type is described 
by a wave function V 'r(p \ , . . . , p n ) that dépends on these particles and assumes com- 
plex values. In a certain sense, in the “general case,” a wave function is symmetric 
or antisymmetric, and which of these two possibilities is realized dépends only on 
the type of particle: photons, électrons, and so on. If the wave function is symmet- 
ric, then the particles are called bosons, and in this case, we say that the quantum- 
mechanical System under considération is subordinate to the Bose-Einstein statis- 
tics. On the other hand, if the wave function is antisymmetric, then the particles 
are called fermions, and we say that the System is subordinate to the Fermi-Dirac 
statistics? 


3 For example, photons are bosons, and the particles that make up the atom — électrons, protons, 
and neutrons — are fermions. 
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We shall return to a considération of symmetric and antisymmetric functions in 
the closing chapters of this book. For now, we would like to answer the following 
question: How is an antisymmetric function transformed under an arbitrary permuta- 
tion of the indices? In other words, we would like to express F(flq , . . . , a\ n ) in terms 
of F(a \ , . . . , a n ) for an arbitrary permutation (/ 1 , . . . , i n ) of the indices (1 , ... ,n). 
To answer this, we again turn to Theorem 2.23, according to which the permutation 
(/ 1 , . . . , i n ) can be obtained from the permutation ( 1 , . . . , n) via a certain number 
(k, let us say) of transpositions. However, the hallmark of an antisymmetric func- 
tion is that it changes sign under the transposition of two of its arguments. After k 
transpositions, therefore, it will hâve been altered by the sign (— l) k , and we obtain 
the relationship 

F (ai i , . . . , a t J = (~l) k F(ai, .... a„), (2.37) 

where the collection of éléments a - lx , from the set M is obtained from the 

collection a \, . . . , a n by means of the permutation under considération consisting 
of k transpositions. 

The relationship (2.37) has about it a certain ambiguity. Namely, the number k 
indicates the number of transpositions that are executed in passing from (1 , ,n) 
to the permutation (i \ , . . . , i n ). But such a passage can in general be accomplished 
in a variety of ways, and so the required number k of transpositions can assume 
a number of different values. For example, to pass from (1, 2, 3) to the permuta- 
tion (3, 2, 1), we could begin with the transposition tp 2 , obtaining (2, 1,3). Then 
we could apply the transposition 12,3 and arrive at the permutation (2, 3, 1). And 
finally, again carrying out the transposition ri, 2 , we would arrive at the permutation 
(3,2, 1). Altogether, we carried out three transpositions. On the other hand, we can 
carry out a single transposition (ri 3 ), which from (1, 2, 3) gives us immediately the 
permutation (3,2, 1). Nevertheless, let us note that we hâve not produced any incon- 
sistency with (2.37), since both values of k, namely 3 and 1, are odd, and therefore 
in both cases, the coefficient (— l) k has the same value. 

Let us show that the parity of the number of transpositions used in passing from 
one given permutation to another dépends only 011 the permutations themselves 
and not on the choice of transpositions. Let us suppose that we hâve an antisym- 
metric function F(a ,\ , . . . , a n ) that dépends on n éléments of a set M and is not 
identically zéro. This last assumption means that there exists a set of distinct él- 
éments a \ , . . . , a n from the set M such that F (a 1 , . . . , a n ) ^ 0. On applying the 
permutation (i\, ... ,i n ) to this set of éléments, we obtain (fl zi , . . . , a Z/! ), with the 
values F (fl 1 , ... , a n ) and F(a Zl , . . . , a ln ) related by (2.37). If we can obtain the 
permutation (/ 1 , . . . , i n ) from ( 1 , . . . , n) in two different ways, that is, using k and / 
transpositions, then from formula (2.37) we hâve the equality (— l) k = (— 1) / , since 
F (a 1 , ...,«„) 7 ^ 0, and therefore the numbers k and / hâve the same parity, that is, 
either both are even or both are odd. 

But there is a function known to us that possesses this property, namely the déter- 
minant (as a function of the rows of a matrix)! Indeed, Property 2.9 from Sect. 2.2 
asserts that the déterminant is an antisymmetric function of its rows. This function 
is nonzero for some a\, ... ,a n . For example, \E\ = 1. In other words, to prove our 
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assertion, it suffices to consider the déterminant of the matrix E as an antisymmet- 
ric function of its n rows ei = (0, . . . , 1, . . . , 0), where there is a 1 in the zth place 
and zéros in the other places, for i = 1, . . . , n. (In the course of our argument, these 
rows will be transposed, so that in fact, we shall consider déterminants of matrices 
more complex than E.) Thus by a rather roundabout route, using properties of the 
déterminant, we hâve obtained the following property of permutations. 

Theorem 2.25 For any passage from the permutation (1 , ... ,n) to the permutation 
J — O’i, . . . , j n ) by means of transpositions (which is always possible , thanks to 
Theorem 2.23), the parity of the number of transpositions will be the same as for 
any other passage between these two permutations. 

Thus the set of ail permutations of n items can be divided into two classes: those 
that can be obtained from the permutation (1 , . . . , n) by means of an even number of 
transpositions and those that can be obtained with an odd number of transpositions. 
Permutations of the first type are called even , and those of the second type are called 
odd. If some permutation J is obtained by k transpositions, then we introduce the 
notation 

S(J) = (-U*. 

In other words, for an even permutation /, the number s(J) is equal to 1, and for 
an odd permutation, we hâve s(J) — — 1. 

We hâve proved the consistency of the notion of even and odd permutation in a 
rather roundabout way, using the properties of the déterminant. In fact, it would hâve 
sufficed for us to produce any antisymmetric function not identically zéro, and we 
used one that was familiar to us: the déterminant as a function of its rows. We could 
hâve invoked a simpler function. Let M be a set of numbers, and for x \ , . . . , x n e M, 
we set 


FOl, ...,X n ) = (X2 -X[)(X3 -Xl)---(x n -X\)---(x n -X n -l) 


= ]""[(*/ -Xj). 
Î>j 


(2.38) 


Let us verify that this function is antisymmetric. To this end, we introduce the fol- 
lowing lemma. 

Lemma 2.26 Any transposition can be obtained as the resuit of an odd number of 
transpositions of adjacent éléments , that is, transpositions of the form £ + i. 

We actually proved this statement in essence in Sect. 2.2 when we derived Prop- 
erty 2.9 from Property 2.6. There we did not use the term “transposition,” and in- 
stead we spoke about interchanging the rows of a déterminant. But that very simple 
proof can be applied to the éléments of any set, and therefore we shall not repeat the 
argument. 
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Thus it suffices to prove that the function (2.38) changes sign under the exchange 
of Xk and Xk+\- But in this case, the factors (x z - — xj ) for i ^ k,k + 1, j ^ k,k + 1, 
on the right-hand side of the équation do not change at ail. The factors (x; — x^) 
and (x/ — Xk+ 1 ) for i > k + 1 change places, as do (x& — xj) and (x^+i — xj) for 
j < k + 1 also. There remains a single factor (x^+i — x&), which changes sign. It 
is also clear that the function (2.38) differs from zéro for any distinct set of values 
x i , . . . , x n . 

We can now apply formula (2.37) to the function given by relation (2.38), by 
which we proved Theorem 2.25, which means that the notion of the parity of a 
permutation is well defined. We note, however, that our “simpler” method is very 
close to our “roundabout” way with which we began, since formula (2.38) defines 
the Vandermonde déterminant of order n (see formula (2.33) in Sect. 2.4). Let us 
choose the numbers x/ in such a way that x\ < X 2 < • • • < x n (for example, we may 
set Xi = i). Then on the right-hand side of relation (2.38), ail factors will be positive. 

Let us now write down the analogous relation for F(x Zl , . . . , x /;/ ). Since the per- 
mutation (z'i, ...,/„) assigns the number Xi k to the number x&, from (2.37), we ob- 
tain 

F(x il , . . . , Xi n ) = Y\ ( x ‘k ~ x n ) • (2.39) 

k>l 

The sign of F(xq , . . . , Xj n ) is determined by the number of négative factors on the 
right-hand side of (2.39). Indeed, F(x/,, . . . , x Z// ) > 0 if the number of factors is 
even, while F(x zi , . . . , x, n ) < 0 if it is odd. Négative factors (xi k — x z/ ) arise when- 
ever xi k < x z/ , and in view of the choice x\ < X 2 < • • • < x„, this means that < //. 
It follows that to the négative factors (xt k — x z/ ) there correspond those pairs of 
numbers k and / for which k > l and < //. In this case, we say that the numbers 
ik and // in the permutation (/ 1 , . . . , i n ) stand in reverse order , or that they form an 
inversion. Thus a permutation is even or odd according to whether it contains an 
even or odd number of inversions. For example, in the permutation (4, 3, 2, 5, 1), 
the inversions are the pairs (4, 3), (4, 2), (4, 1), (3, 2), (3, 1), (2, 1), (5, 1). In ail, 
there are seven of them, which means that F ( 4, 3, 2, 5, 1) < 0, and the permutation 
(4, 3, 2, 5, 1) is odd. 

Using these concepts, we can now formulate the following theorem. 

Theorem 2.27 The déterminant of a square matrix of order n is the unique function 
F (a \ , ü 2 , . . . , a n ) of n rows oflength n that satisfies the following conditions : 

(a) It is linear as a function of an arbitrary row. 

(b) It is antisymmetric. 

(c) F(e\, e 2 , . . . , € n ) = 1, where ei is the row with 1 in the ith place and zéros in 
ail other places. 

This is the most “scientific,” though far from the simplest, définition of the déter- 
minant. 

In this section, we hâve not presented a single new property of the déterminant, 
instead discussing in detail its property of being an antisymmetric function of its 
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Fig. 2.2 Path length O BAC 

O O O O > 

rows. The reason for this is that the property of antisymmetry of the déterminant 
is connected with a large number of questions in mathematics. For example, in 
Sect. 2.1, we introduced déterminants of orders 2 and 3. They hâve an important 
géométrie significance, expressing the area and volume of simple géométrie figures 
(Figs. 2.1 (a) and (b)). 

But here we encounter a paradoxical situation: Sometimes, one obtains for the 
area (or volume) a négative value. It is easy to see that we obtain a positive or nég- 
ative value for the area of triangle O AB (or the volume of the tetrahedron O ABC) 
depending on the order of the vertices A, B (or A, B , C). More precisely, the area of 
triangle O AB is positive if we can obtain the ray OA from O B by rotating it clock- 
wise through the triangle, while the area is négative if we obtain OA by rotating 
O B counterclockwise through the triangle (in other words, the rotation is always 
through an angle of measure less than jt). Thus the déterminant expresses the area 
of a triangle (with coefficient \) with a given ordering of the sides, and the area 
changes sign if we reverse the order. That is, it is an antisymmetric function. 

In the case of volume, choosing the order of the vertices is connected to the 
concept of orientation of space. The same concept appears as well in hyperspaces 
of dimension n > 3, but for now, we shall not go too deeply into such questions; 
we shall return to them in Sects. 4.4 and 7.3. Let us say only that this concept is 
necessary for constructing the theory of volumes and the theory of intégration. In 
fact, the notion of orientation arises already in the case n — 1 , when we consider 
the length of an interval OA (where O is the origin of the line, namely the point 0, 
and the point A has the coordinate x) to be the déterminant x of order 1, which will 
be positive precisely when A lies to the right of O . Analogously, if the point B has 
coordinate y, then the length of the segment AB is equal to y — x, which will be 
positive only if B lies to the right of A. Thus the length of a segment dépends on 
the ordering of its endpoints, and it changes sign if the endpoints exchange places 
(thus length is an antisymmetric function). It is only by a similar convention that we 
can say that the length of O ABC is equal to the length of OC (Fig. 2.2). And if we 
were to use only positive lengths, then we would end up with the length of O ABC 
being given by the expression | OA| + \ AB \ + \ BA\ + |AC| = | OC| + 2\AB\. 


2.7 Explicit Formula for the Déterminant 

Formula (2.12), which we used in Sect. 2.2 to compute the déterminant of order n, 
expresses that déterminant in terms of déterminants of smaller orders. It is assumed 
that this method can be applied in turn to these smaller déterminants, and passing 
to déterminants of smaller and smaller orders, to arrive at a déterminant of order 1, 
which for the matrix (an) is equal to a\\. We thereby obtain an expression for the 
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déterminant of the matrix 



(an 

a \2 

a\,f 

a2\ 

a 22 

* * ‘ a 2n 

\fin 1 

a n 2 
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in terms of its éléments. This expression is rather complicated, and for deriving 
the properties of déterminants it is simpler to use the inductive procedure given in 
Sect. 2.2. But now we are ready to discover this complicated définition. First of ail, 
let us prove a lemma, which appears obvious at first glance but nonetheless requires 
proof (though it is very simple). 

Lemma 2.28 If the linear function f(x) for a row x of length n is written in two 
ways , 

n n 

f(x) = y^ajXi, f(x) = y^bjXj, 
i = 1 1 = 1 

then a\ — b\, ü 2 — b 2 , . . . , a n — b n . 

Proof Both of the équations for f(x) must hold for arbitrary x. Let us suppose in 
particular that x = et = (0, . . . , 1, . . . , 0), where 1 is located in the i th position (we 
hâve already encountered the rows €{ in the proof of Theorem 1.3). Then from the 
initial supposition, we obtain that f(ef) = ai, and from the second, that f(et) — bj. 
Therefore, a; = bj for ail /, which is what was to be proved. □ 

We shall consider the déterminant \A\ as a function of the rows ai, a 2 , . . . , a n of 
the matrix A. As shown in Sect. 2.2, the déterminant is a linear function of any row 
of the matrix. A function from any number m of rows ail of length n is said to be 
multilinear if it is linear in each row (with the other rows held fixed). 

Theorem 2.29 A multilinear function F(a\,a 2 , . . . , ci m ) can be expressed in the 
form 

P {d 1 , d2 » • • • , dm ) — ^ ^ Oti\ ,?2, ...,/tfjAln ^2/2 " ' ' ^mi m » (2.40) 

(il, *2, 

if as usual, a { — (an, a/ 2 , . . . , a/ M ), and the sum is taken over arbitrary collections 
of numbers (i 1 , i ‘2 , • • . , im) from the set 1,2,..., n , where û?/ i , / 2 , . . . , / m are certain co- 
efficients that dépend only on the function F and not on the rows a \ , a 2 , . . . , a m . 

Proof The proof is by induction on the number m. For m — 1, the proof of the 
theorem is obvious by the définition of a linear function. For m > 1, we shall use 
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the fact that 

n 

F(ai,a 2 ,...,a m ) = ^<p;(« 2 , . • • , a m )au (2.41) 

i = 1 

for arbitrary «i, where the coefficients <pi dépend on a 2 , . . . , a m ; that is, they are 
functions of these numbers. 

Let us verify that ail the functions <p; are multilinear. Let us show, for example, 
linearity with respect to a 2 . Using the linearity of the function F (ai, a 2 , . . . , a m ) 
with respect to a 2 , we obtain 

F(ai,a' 2 + a 2 ,..., a m ) = F(ai,a 2 , .... a m ) + F (au a 2 , . . . , a m ), 
or 

n n 

^2 <Pi («2 + «2» ' * * ’ ( a 2’ • • • * a m) + Vi ( a 2> • • • » 0/w))*i 

i=l i=l 

for x z - = «i/, that is, for arbitrary x;. From this, by the lemma, we obtain 

<Pi (#2 + ^2 ’ • ■ • ’ O-m) — <Pi (^2’ • ■ • » “b ^Pi {&2 » • ■ • » • 

In precisely the same way, we can verify the second property of linear functions 
in Theorem 1.3. From this theorem it is seen that the functions <pi(a 2 , . . . , a m ) are 
linear with respect to a 2 , and analogously that they are multilinear. Now by the 
induction hypothesis, we hâve for each of them the expression 

(Pifa 2» • • • » ^m) = ^ ^ A - 2 ,...,//h ^2z - 2 ’ * ’ iïfni m (2.42) 

(hf-Jm) 

(the index i in B) . indicates that these constants are connected with the function 
(pi). To complété the proof, it remains for us, changing notation, to set i — i i, to 
substitute the expressions (2.42) into (2.41), and set p l 3 im — . □ 

Remark 2.30 The constants in the relationship (2.40) can be found from 

the formulas 

~ ei 2 , . . . , €i m ) , (2.43) 

where ej again dénotés the row (0, . . . , 1, . . . , 0), in which there is a 1 in the j th 
position and zéros everywhere else. 

Indeed, if we substitute a i = eq, a 2 = ei 2 , . . . , a m = et m in the relationship 
(2.40), then the term a\i Y a 2 i 2 - • • a m i m becomes 1, while the remaining products 
a\ j\ a 2n ‘ * * a mj m are equal to 0. This proves (2.43). 

Let us now apply Theorem 2.29 and (2.43) to the déterminant \A\ as a function 
of the rows a \ , a 2 , . . . ,a n of the matrix A. Silice we know that the déterminant is a 
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multilinear function, it must satisfy the relationship (2.40) (m — n), and the coeffi- 
cients can determined from formula (2.43). Consequently, ,/ 2 , is 

equal to the déterminant | Ej { j 2 //7 1 of the matrix whose first row is equal to e - n , the 
second is ei 2 , . . . , and the nth is ei n . If any of the numbers i\, i 2 , . . . , i n are equal, 
then | Ei l j 2 ^"j n | = 0, in view of Property 2.10 of Sect. 2.2. It thus remains to exam- 
ine the déterminant \E( l j 2tmm j n | in the case that (/ 1 , i 2 , . . . , i n ) is a permutation of 
the numbers (1,2, . . . , n). But this déterminant is obtained from the déterminant \E\ 
of the identity matrix if we operate on its rows by the permutation (i\, i 2l . . . , i n )- 
Furthermore, we know that the déterminant is an antisymmetric function of its rows 
(see Property 2.9 in Sect. 2.2). Therefore, we can apply to it property (2.37) of anti- 
symmetric functions, and we obtain 

\Ei l ,i 1 ,...,in\ = e ( I '> ■ l £ l> where 1 = O'i, h, 

Since |Zs| = 1, we hâve the equalities oti u i 2 i n — £(/) if the permutation I is equal 

to (i\, i 2 , . . . , i n )- 

As a resuit, we obtain an expression for the déterminant of the matrix A : 

\A \ = ^ e(I)- a\ ix a 2il • • • a nin , (2.44) 

/ 

where the sum ranges over ail permutations I — (/ 1 , i 2 , . . . , i n ) of the numbers 
(1, 2, . . . , n). The expression (2.44) is called the explicit formula for the détermi- 
nant. It is worthwhile reformulating this in words: 

The déterminant of a matrix A is equal to the sum of ternis each of which is the product of 
n éléments au of the matrix A , taken one from each row and column. If the factors of such 
a product are arranged in increasing order of the row numbers, then the term appears with a 
plus or minus sign depending on whether the corresponding column numbers form an even 
or odd permutation. 


2.8 The Rank of a Matrix 

In this section, we introduce several fundamental concepts and use them to prove 
several new results about Systems of linear équations. 

Définition 2.31 A matrix whose / th row coincides with the i th column of a matrix 
A for ail i is called the transpose of the matrix A and is denoted by A*. 

It is clear that if we dénoté by a,j the element located in the i th row and yth 
column of the matrix A, and by bij the corresponding element of the matrix A*, 
then bij = aji. If the matrix A is of type (i n , m), then A* is of type (m, n). 

Theorem 2.32 The déterminant of the transpose of a square matrix is equal to the 
déterminant ofthe original matrix. That is , |A*| = |A|. 


54 


2 Matrices and Déterminants 


P roof Consider the following function of a matrix A: 


F (A) = 



This function exhibits properties 1 and 2 formulated in Sect. 2.3 (page 37). Indeed, 
the rows of the matrix A* are the columns of A, and thus the assertion that the 
function F (A) (that is, the déterminant | A*| as a function of the matrix A) possesses 
properties 1 and 2 for the rows of the matrix A is équivalent to the assertion that the 
déterminant |A*| possesses the same properties for its columns. This follows from 
Theorem 2.17. Therefore, Theorem 2.15 is applicable to F (A), whence 


F (A) = k\A\, 


where k = F (F ) = |F*|, with E the n x n identity matrix. Clearly, F* = F, and 
therefore, k = |F*| = |F| = 1. It follows that F (A) = |A|, which complétés the 
proof of the theorem. □ 


Définition 2.33 A square matrix A is said to be symmetric if A = A*, and antisym- 
metric if A = — A*. 


It is clear that if ci[j dénotés the element located in the zth row and j th column of 
a matrix A, then the condition A = A* can be written in the form atj = aji , while 
A = — A* can be written as aij = —aji . From this last relationship, it follows that ail 
éléments an on the main diagonal of an antisymmetric matrix must be equal to zéro. 
Furthermore, it follows from the properties of the déterminant that an antisymmetric 
matrix of odd order is singular. Indeed, if A is a square matrix of order /?, then from 
the définition of multiplication of a matrix by a number and the linearity of the 
déterminant in each row, we obtain the relationship | — A*| = (— l) /z |A|, from which 
A = — A* yields \A\ = (— l) 77 1 A | , which in the case of odd n is possible only if 
I A | = 0. 

Symmetric and antisymmetric matrices play an important rôle in mathematics 
and physics, and we shall encounter them in the following chapters, for example in 
the study of bilinear forms. 


Définition 2.34 A minor of order r o fa matrix 



( a\\ 

a\2 

û\n ^ 
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ai\ 
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Ü22 
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&m2 

tt/nn / 


(2.45) 


is a déterminant of order r obtained from the matrix (2.45) by eliminating ail entries 
of the matrix except for those simultaneously in r given rows and r given columns. 
Here we clearly must assume that r < m and r <n. 
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For example, the minors of order 1 are the individual éléments of the matrix, 
while the unique minor of order n of a square matrix of order n is the déterminant 
of the entire matrix. 

Définition 2.35 The rank of matrix (2.45) is the maximum over the orders of its 
nonzero minors. 

In other words, the rank is the smallest number r such that ail the minors of rank 
s > r are equal to zéro or there are no such minors (if r = min {m, n}). 

Let us note one obvious corollary of Theorem 2.32. 

Theorem 2.36 The rank of a matrix is not affected by taking the transpose. 

P roof The minors of the matrix A* are obtained as the transposes of the minors 
of matrix A (in taking the transpose, the indices of the rows and columns change 
places). Therefore, the ranks of the matrices A* and A coincide. □ 

Let us recall that in presenting the method of Gaussian élimination in Sect. 1.2, 
we introduced elementary row operations of types I and II on the équations of a 
System. These operations changed both the coefficients of the unknowns and the 
constant terms. If we now focus our attention solely on the coefficients of the un- 
knowns, then we may say that we are carrying out elementary operations on the rows 
of the matrix of the System. This gives us the possibility of using Gauss’s method to 
détermine the rank of a matrix. 

A fundamental property of the rank of a matrix is expressed in the following 
theorem. 

Theorem 2.37 The rank of a matrix is unchanged under elementary operations on 
its rows and columns. 

Proof We shall carry out the proof for elementary row operations of type II (for 
type I, the proof is analogous, and even simpler). After adding p times the y th row 
of the matrix A to the i th row, we obtain a new matrix; call it B. We shall dénoté the 
rank of a matrix by the operator rk and suppose that rk A = r. If among the nonzero 
minors of order r of the matrix A there is at least one not containing the i th row, 
then it will not be altered by the given operation, and it follows that it will be a 
nonzero minor of the matrix B. Therefore, we may conclude that rk B > r. 

Now let us suppose that ail nonzero minors of order r of the matrix A contain 
the i th row. Let M be one such minor, involving rows numbered i\, . . . ,i r , where 
i k — i for some k, 1 < k < r. Let us dénoté by N the minor of the matrix B involv- 
ing the columns with the same indices as M . If j coincides with one of the numbers 
i\, ... ,i r , then this transformation of the matrix A is also an elementary transfor- 
mation of the minor M , under which it is converted into N. Since the déterminant 
is unaffected by an elementary transformation of type II, we must hâve N — M, 
whence it follows that rk B > r . 
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Now suppose that j does not coincide with 011 e of the numbers i \, . . . , i r . Let 
us dénoté by M' the minor of the matrix A involving the same columns as M and 
rows numbered i , ik- i , j, h+ 1 , • • • , h - In other words, M ' is obtained from M 
by replacing the z^th by the yth row of the matrix A. Since the déterminant is a 
linear function of its rows, we therefore hâve the equality N — M + pM' . But by 
our assumption, M ' — 0, since the minor M' does not contain the i th row of the 
matrix A. Thus we obtain the equality N = M, from which it follows that rk B > r. 

Thus in ail cases we hâve proved that rkZ? > rkA. However, since the matrix A, 
in turn, can be obtained from B by means of elementary operations of type II, we 
hâve the reverse rk A > rk B. From this, it clearly follows that rk A = rk B. 

By similar arguments, but carried out for operations on the columns, we can 
show that the rank of a matrix is unchanged under elementary column operations. 
Furthermore, the assertion for the columns follows from analogous assertions about 
the rows if we make use of Theorem 2.36. □ 

Now we are in a position to formulate answers to the questions that were resolved 
earlier by Theorems 1.16 and 1.17, without reducing the System to échelon form 
but instead using explicit expressions that dépend on the coefficients. Bringing the 
System into échelon form will be présent in our proofs, but will not appear in the 
final formulations. 

Let us assume that by elementary operations, we hâve brought a System of équa- 
tions into échelon form (1.18). By Theorem 2.37, both the rank of the matrix of 
the System and the rank of the augmented matrix will hâve remained unchanged. 
Clearly, the rank of the matrix of (1.18) is equal to r: a minor at the intersection of 
the first r rows and the r columns numbered 1 , k, ..., s is equal to â\\â 2 k * • • a rs , 
which implies that it is different from zéro, and any other minor of greater order 
must contain a row of zéros and is therefore equal to zéro. Therefore, the rank of the 
matrix of the initial System (1.3) is equal to r. 

The rank of the augmented matrix of System (1.18) is also equal to r if ail the 
constants Z? r+ i = • • • = b n are equal to zéro or if there are no équations with such 
numbers ( m — r). However, if at least one of the numbers b r + 1 , . . . , b n is differ- 
ent from zéro, then the rank of the augmented matrix will be greater than r. For 
example, if b r + \ 0, then the minor of order r - h 1 involving the first r - h 1 rows 

of the augmented matrix and the columns numbered l,k, . . . ,s,n + 1 is equal to 
âi\cÎ 2 k ■ • 'â rs br + 1 and is different from zéro. Thus the compatibility criterion for- 
mulated in Theorem 1.16 can also be expressed in terms of the rank: the rank of 
the matrix of System (1.3) must be equal to the rank of the augmented matrix of the 
System. Since by Theorem 2.37, the rank of the matrix and augmented matrix of the 
initial System (1.3) are equal to the ranks of the corresponding matrices of (1.18), 
we obtain the compatibility condition called the Rouché-Capelli theorem. 

Theorem 2.38 The System of linear équations (1.3) is consistent if and only if the 
rank of the matrix of the System is equal to the rank of the augmented matrix. 

The same considérations make it possible to reformulate Theorem 1.17 in the 
following form. 
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Theorem 2.39 If the System oflinear équations (1.3) is consistent , then it is definite 
(that is, it has a unique solution ) if and only if the rank of the matrix of the System 
is equal to the number ofunknowns. 

We can explain further the significance of the concept of the rank of a matrix in 
the theory of linear équations by introducing a further notion, one that is important 
in and of itself. 

Définition 2.40 Suppose we are given m rows of a given length n: a\, «2, . . . , fl m . 
A row a of the same length is said to be a linear combination of a \ , ü2, . . . , a m if 
there exist numbers p\, p2 , . . . , p m such that a — p\a\ + /?2«2 H h /? m fl m . 


Let us mention two properties of linear combinations. 

1 . If a is a linear combination of the rows a\, . . . ,a m , each of which, in turn, is a 
linear combination of the same set of rows b\, ... ,bk, then fl is a linear combi- 
nation of the rows b \ , . . . , bk. 

Indeed, by the définition of a linear combination, there exist numbers q, j such 
that 


«i =qnb\ +qi2b2-\ Vqikbk, i = 1 , — , m, 

and numbers p\ such that a — p\a\ + p2^2 H + Pm^m- Substituting in the 

last equality the expression for the rows fl/ in terms of b \ , . . . , Zq, we obtain 

a — Pl(qnb\ q\2^2 H \~q\kbk) 

+ P2(q2\b\ + < 722^2 H K q 2 kbk) H 

+ Pm(qm\b\ + q m 2 b 2 + 1 " qmkbk )• 

Removing parenthèses and collecting like terms yields 

« = (piqn + P2q2\ H h p m q m \)b\ 

+ (P\q\2 + P2q22 H h p m qm2)b2 H — 

+ (p\q\k + P2q2k H 1 " Pmqmk)bk » 

that is, the expression fl as a linear combination of the rows Zq , . . . ,b^. 

2 . When we apply elementary operations to the rows of a matrix, we obtain rows 
that are linear combinations of the rows of the original matrix. 

This is obvious for elementary operations both of type I and of type II. 

Let us apply Gaussian élimination to a certain matrix A of rank r. Changing the 
numération of the rows and columns, we may assume that a nonzero minor of order 
r is located in the first r rows and r columns of the matrix. Then by elementary 
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operations on its first r rows, the matrix is put into the form 

«11 012 ••• a\ r 0ir+i 

0 022 ••• «2r «2r+l 

0 0 • • • â rr a r r + 1 

«r+1 1 

• • t • • 

• • • • • 

Am 1 

where â\\ 7^ 0 , . . . , â rr 7^ 0 . We can now subtract from the (r + l)st row the first 
row multiplied by a number such that the first element of the row thus obtained 
is equal to zéro, then the second row multiplied by a number such that the second 
element of the row thus obtained equals zéro, and so on, until we obtain the matrix 

Il «12 a\ r «lr+l 

) 022 ••• «2 r «2r+l 

• • • • • 

• • • • • 

• • • • • 

3 0 -'ârr â rr + 1 

D 0 ••• 0 0r+lr+l 

• • * • • 

D 0 • • • 0 0mr+l 

Since the matrix A was obtained from A using a sequence of elementary operations, 
its rank must be equal to r. 

Let us show that the entire (r + l)st row of the matrix A consists of zéros. Indeed, 
if there were an element in the row â r +\k 7^ 0 for some k — 1 , . . . , 0, then the minor 

of the matrix A formed by the intersection of the first r + 1 rows and the columns 
numbered 1,2, . . . , r, k would be given by 

011 012 ••• 0 lr â\k 

0 022 • * ‘ «2 r â 2 k 

: : : : =«n«22 • • -â rr â r +\k 7^ 0 , 

0 0 • • • ârr ârk 

0 0 • • • 0 0 r _|_i k 

which contradicts the established fact that the rank of A is equal to r . 

This resuit can be formulated thus: If 01 , ... , 0 r +i are the first r + 1 rows of the 
matrix A, then there exist numbers p \ , . . . , p r such that 

âr+i - P\â\ p r a r = 0 . 
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From this, it follows that â r +\ — p \â\ H h p,d r . That is, the row ci r +\ is a linear 

combination of the first r rows of the matrix A. But the matrix A was obtained as 
the resuit of elementary operations on the first r rows of the matrix A, whence it 
follows that ail rows of the matrices A and A numbered greater than r coincide. 
We see, therefore, that the (r + 1) st row of the matrix A is a linear combination of 
the rows a \ , . . . , â r +\ , each of which, in turn, is a linear combination of the first r 
rows of the matrix A. Consequently, the (r + l)st row of the matrix A is a linear 
combination of its first r rows. 

This line of reasoning carried out for the (r + l)st row can be applied equally 
well to any row numbered i > r. Therefore, every row of the matrix A is a linear 
combination of its first r rows (note that in this case, the first r rows played a spécial 
rôle, since for notational convenience, we numbered the rows and columns in such 
a way that a nonzero minor was located in the first r rows and first r columns). In 
the general case, we obtain the following resuit. 

Theorem 2.41 If the rank of a matrix is equal to r, then ail of its rows are linear 
combinations ofsome r rows. 

Remark 2.42 To put it more precisely, we hâve shown that if there exists a nonzero 
minor of order equal to the rank of the matrix, then every row can be written as a 
linear combination of the rows in which this minor is located. 

The application of these ideas to Systems of linear équations is based on the fol- 
lowing obvious lemma. Here, as in a high-school course, we shall call the équation 
F (x) — b a corollary of équations (1.10) if every solution c of the System (1.10) sat- 
isfies the relationship F (c) — b. In other words, this means that if we assign to the 
System (1.10) one additional équation F (x) = b, we obtain an équivalent System. 

Lemma 2.43 If in the augmented matrix ofthe System (1.3), some row ( say with in- 
dex l ) is a linear combination ofk rows , with indices i \ , . . . , ik, then the Ith équation 
ofthe System is a corollary ofthe k équations with those indices. 

Proof The proof proceeds by direct vérification. To simplify the présentation, let us 
assume that we are talking about the first k rows of the augmented matrix. Then by 
définition, there exist k numbers a\ , . . . , such that 


u\(a\\, a\ 2 , • • • , a\ n ,b\) + <^2(^21 , <^22, • • • , bf) H 


+ aiç(akuak2, ...,akn,bk) 


= (tf/l,<3/2, - ..,ain,bi). 

This means that for every i = 1, . . . , n, the following équations are satisfied: 

I otiau + 0L2a2i H h oikajd =an for i = 1 , 2 , . . . , n, 

oi\b\ +Œ2b2 H h Cikbk = bi. 
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Then if we multiply équations numbered 1, 2, . . . , k in our System by the numbers 
ot \ , . . . , otk respectively and add the products, we obtain the Zth équation of the Sys- 
tem. That is, in the notation of (1.10), we obtain 


onFi(x) H h a k F k (x) = Fi(x ), « 1^1 H h a k bk=b t . 

Substituting here x — c, we obtain that if F\ (c) = b\, . . . , Fk(c) = bk , then we hâve 
also F/(c) = bi. That is, the Zth équation is a corollary of the first k équations. □ 

By combining Lemma 2.43 with Theorem 2.41, we obtain the following resuit. 

Theorem 2.44 If the rank of the matrix of System (1.3) coïncides with the rank of 
its augmentecl matrix and is equal to r , then ail the équations of the System are 
corollaries of some r équations ofthe System. 

Therefore, if the rank of the matrix of the combined System (1.3) is equal to r, 
then it is équivalent to a System consisting of some r équations of System (1.3). It is 
possible to select as these r équations any such that in the rows with corresponding 
indices there occurs a nonzero minor of order r of the matrix of the System (1.3). 


2.9 Operations on Matrices 

In this section, we shall define certain operations on matrices that while simple, are 
very important for the following présentation. First, we shall define these operations 
purely formally. Their deeper significance will become clear in the examples pre- 
sented below, and above ail, in the following chapter, where matrices are connected 
to géométrie concepts by linear transformations of vector spaces. 

First of ail, let us agréé that by the equality A — B for two matrices is meant 
that A and B are matrices of the same type and that their éléments (denoted by aij 
and b[j ) with like indices are equal. That is, if A and B each hâve m rows and n 
columns, then to write A — B means that the m ■ n equalities aij — bjj hold for ail 
indices i = 1 , . . . , m and j = 1 , . . . , n . 

Définition 2.45 Let A be an arbitrary matrix of type (m, n) with éléments ai j , and 
let p be some number. The procluct of the matrix A and the number p is the matrix 
B , also of type (m, n), whose éléments satisfy the équations bjj = paij . It is denoted 
by B — pA. 


Just as is done for numbers, the matrix obtained by multiplying A by the number 
— 1 is denoted by — A and is called the additive inverse or opposite. In the case of the 
product obtained by multiplying an arbitrary matrix of type (m, n) by the number 0, 
we obviously obtain a matrix of the same type, ail of whose éléments are zéro. It is 
called the null or zéro matrix of type (m, n) and is denoted by 0. 
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Définition 2.46 Let A and B be two matrices, each of type (ra, n), with éléments 
denoted as usual by aij and b[j. The sum of A and B is the matrix C, also of type 
(m, n), whose éléments cij are defined by the formula Cjj — aij + bij . This is written 
as the equality C = A + B . 

Let us emphasize that both sum and equality are defined only for matrices of the 
same type. 

With these définitions in hand, it is now easy to verify that just as in the case 
of numbers, one has the following rules for removing parenthèses: ( p + q)A — 
pA + qA forany two numbers p, q and matrices A, as well as p(A + B) — pA + pB 
for any number p and matrices A, B of the same type. It is just as easily verified that 
the addition of matrices does not dépend on the order of summation, A + B — B + A, 
and that the sum of three (or more) matrices does not dépend on the arrangement of 
parenthèses, that is, (A + B) + C — A + (B + C). Using addition and multiplication 
by — 1, it is possible as well to define the différence of matrices: A — B — A-\-{— B). 

We now define another, the most important of ail, operation on matrices, called 
the matrix product or matrix multiplication. Like addition, this operation is defined 
not for matrices of arbitrary type, but only for those whose dimensions obey a certain 
relationship. 

Définition 2.47 Let A be a matrix of type ( m,n ), whose éléments we shall dénoté 
by aij , and let B be a matrix of type ( n , k) with éléments bij (we observe that here 
in general, the indices i and j of the éléments ci[j and bij run over different sets 
of values). The product of matrices A and B is the matrix C of type (m, k) whose 
éléments Cij are determined by the formula 

Ci j = anbij + a i2 b 2 j H h a in b n j . (2.46) 

We write the matrix product as C = A • B or simply C — AB. 


Thus the product of two rectangular matrices A and B is defined only in the case 
that the number of columns of matrix A is equal to the number of rows of matrix B , 
while otherwise, the product is undefined (the reason for this will become clear in 
the following chapter). The important spécial case n—m—k shows that the product 
of two (and therefore, an arbitrary number of) square matrices of the same order is 
well defined. 

Let us clarify the above définition with the help of some examples. 


Example 2.48 In what follows, we shall frequently encounter matrices of types 
(1 ,n) and (n, 1), that is, rows and columns of length n, often called row vectors 
and column vectors. For such vectors it is convenient to introduce spécial notation: 


a = (ai,...a„), 



f[h\ 

\Pn) 


(2.47) 
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that is, et is a matrix of type (1 , n), while [/?] is a matrix of type (/z , 1). Such matrices 
are clearly related by the transpose operator: [a] = a* and [fi] = /?*. By définition, 
then, the product of the matrices in (2.47) is a matrix C of type (1, 1), that is, a 
number c, which is equal to 


c — a i/3i H \~ot n p n . (2.48) 

In the cases n — 2 and n — 3, the product (2.48) coincides with the notion of the 
scalar product of vectors, well known from courses in analytic (or even elemen- 
tary) geometry, if we consider et and [fi ] as vectors whose coordinates are written 
respectively in the form of a row and the form of a column. 

Using formula (2.48), we can express the product rule of matrices given by for- 
mula (2.46) by saying that one multiplies the rows of matrix A by the columns of 
matrix B. Put more precisely, the element cij is determined by formula (2.48) as the 
product of the z th row et { of matrix A and the j th column \(}]j of matrix B. 

Example 2.49 Let A be a matrix of type (m, n) from formula (1.4) (p. 2), and let 
[x] be a matrix of type (l,n), that is, a column vector, comprising the éléments 
x \, . . . , x n , written analogously to the right-hand side of (2.47). Then their product 
A[x] is a matrix of type (m, 1), that is, a column vector, comprising, by formula 
(2.46), the éléments 

ai \X\ T ai 2 X 2 T • • • T cii n x n , l — 1, . . . , m. 

This shows that the System of linear équations (1.3) that we studied in Sect. 1.1 can 
be written in the more abbreviated matrix form A[x] = [ b ], where [b] is a matrix of 
type (m, 1) comprising the constants of the System, b 1 , . . . , b m , written as a column. 


Example 2.50 By linear substitution is meant the replacement of variables whereby 
old variables (xi , . . . , x m ) are linear functions of some new variables (yi , . . . , y„), 
that is, they are expressed by the formulas 


x\ =anyi +«12^2 H \ra\ n y n , 

x 2 = a 2 \y\ 4 - a 2 2yi H h a 2 n yn, 


X m — Clm 1 34 H - Cl m 2yi H Cl/nn yn ■> 


(2.49) 


with certain coefficients aij. The matrix A = (aij) is called the matrix of the substi- 
tution (2.49). Let us consider the resuit of two linear substitutions. Let the variables 
(yi , . . . , y n ) be expressed in turn by (z\ , . . . , Zk) according to the formula 

y\ =b\\z\ +b\ 2 Z 2 ~\ \-b\kZk, 

y 2 = b 2 \Z\ + b 2 2Z2 H h b 2 kZk, 


yn =b n \Z\ +b n2 Z2 4 h b nk Zk, 
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with coefficients Z? z/ . Substituting formulas (2.50) into (2.49), we obtain an expres- 
sion for the variables (xi , . . . , x m ) in terms of (zu . . . , Zk)‘- 

Xi = ai\(b\\Z\ + h bikZk) + • • * + Clin{b n \Z\ H b b n kZk) 

— (tf/1^11 H + ainbn\)z\ H h (^i\b\k H + Clinbnk)Zk • (2.51) 

As was doue in the previous example, we may write linear substitutions (2.49) and 
(2.50) in the matrix forms [x] = A[y ] and [y] = B[z\ , where [x], [y], [z] are col- 
umn vectors, whose éléments are the corresponding variables, while A and B are 
matrices of types (m,n) and (n,k) with éléments aij and bjj . Then, by définition 
(2.46), formula (2.51) assumes the form [x] = C[z], where the matrix C is equal to 
AB. In other words, successive application oftwo linear substitutions gives a linear 
substitution whose matrix is equal to the product ofthe matrices ofthe substitutions. 

Remark 2.51 Ail of this makes it possible to formulate a définition of matrix product 
in terms of linear substitutions: the matrix product of A and B is the matrix C that 
is the matrix of the substitution obtained by successive applications of two linear 
substitutions with matrices A and B . 

This obvious remark makes it possible to give a simple and graphie démonstra- 
tion of an important property of the matrix product, called associativity. 

Theorem 2.52 Let Abe a matrix oftype ( m,n ), and let B be a matrix oftype (n, k ), 
and matrix D oftype (k,l). Then 

( AB)D = A(BD ). (2.52) 

P roof Let us first consider the spécial case / = 1, that is, the matrix D in (2.52) 
is a k-t lement column vector. As we hâve remarked, (2.52) is in this case a sim- 
ple conséquence of the interprétation of the matrix product of A and B as the 
resuit of carrying out two linear substitutions of the variables; in the notation of 
Example 2.50, we hâve simply to substitute [z] = D and then use the equalities 
[y] = B[z] 9 [x] = A[y], and [x] = C[z]. 

In the general case, it suffices for the proof of équation (2.52) to observe that 
the product of matrices A and B is reduced to the successive multiplication of 
the rows of A by the columns of B. That is, if we write the matrix B in col- 
umn form, B = (B i, ..., Bf), then AB can analogously be written in the form 
AB — (AB i, . . . , ABk ), where each AB / is a matrix of type (m, 1), that is, also 
a column vector. After this, the proof of equality (2.52) in the general case is almost 
self-evident. Let D consist of / columns: D — (D \, . . . , D/). Then on the left-hand 
side of (2.52), one has the matrix 

(AB)D=((AB)D u ...,(AB)Di), 
and on the right-hand side, the matrix 

A(BD) = A(BD l ,...,BD l ) = (A{BD l ),...,A{BDij), 
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and it remains only to use the proved equality (2.52) with / = 1 for each of the 
column vectors D \ , . . . , Dj . □ 


Let us note that we already considered the associative property in a more abstract 
form (p. xv). By what was proved there, it follows that the product of any number 
of factors does not dépend on the arrangement of parenthèses among them. Thus 
the associative property makes it possible to compute the product of an arbitrary 
number of matrices without indicating any arrangement of parenthèses (it is nec- 
essary only that each pair of associated matrices correspond as to their dimensions 
so that multiplication is defined). In particular, the resuit of the product of an arbi- 
trary square matrix by itself an arbitrary number of times is well defined. It is called 
exponentiation. 

Just as for numbers, the operations of addition and multiplication of matrices are 
linked by the relationships 

A(5 + C) = AB + AC, ( A + B)C = AC + BC , (2.53) 

which clearly follow from the définitions. The property (2.53) connecting addition 
and multiplication is called the distributive property. 

We mention one important property of multiplication involving the identity ma- 
trix: for an arbitrary matrix A of type ( m,n ) and an arbitrary matrix B of type 
(, n,m ), the following equalities hold: 

AE n = A, E n B — B. 


The proofs of both equalities follow from the définition of matrix multiplication, for 
example, using the rule “row times column.” We see, then, that multiplication by the 
matrix E plays the same rôle as multiplication by 1 among ordinary numbers. 

However, another familiar property of multiplication of numbers (called com- 
mutativity ), namely that the product of two numbers is independent of the order in 
which they are multiplied, is not true for matrix multiplication. This follows at a 
minimum from the fact that the product AB of a matrix A of type ( n,m ) and a ma- 
trix B of type (/, k) is defined only if m = /. It could well be that m — l but k ^ n, 
and then the matrix product B A would not be defined, while the product AB was. 
But even, for example, in the case n—m—k—l— 2, with 








where both products AB and B A are defined, we obtain 


AB = 


( ap + br 
cp + dr 


aq + bs\ 
cq + ds ) ’ 


B A — 


fap + cq 
\ar + es 


bp + dq\ 
br + ds ) ’ 


and these are in general unequal matrices. Matrices A and B for which AB — B A 
are called commuting matrices. 
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In connection with the multiplication of matrices, notation is used that we will 
introduce only in the spécial case that we shall actually encounter in what follows. 
Assume that we are given a square matrix A of order n and a natural number p < n. 
The éléments of the matrix A located in the first p rows and first p columns form 
a square matrix An of order p. The éléments located in the first p rows and last 
n — p columns form a rectangular matrix A 12 of type ( p,n — p). The éléments 
located in the first p columns and last n — p rows form a rectangular matrix A21 of 
type (n — p, p). Finally, the éléments in the last n — p rows and last n — p columns 
form a rectangular matrix A 22 of order n — p. This can be written as follows: 


/An Ai2\ 
\A 2 i A22/ 


( 2 . 54 ) 


Formula ( 2 . 54 ) is called the expression of A in block form , while matrices 
An, A12, A21, A22 are the blocks of the matrix A. For example, with these con- 
ventions, formula ( 2 . 15 ) takes the form 



A12 

A22 


A11I * I A22I- 


Clearly, one can conceive of a matrix A in block form for a larger number of matrix 
blocks of various sizes. In addition to the case ( 2 . 54 ) shown above, we shall find 
ourselves in the situation in which blocks stand on the diagonal: 



0 ••• 0 \ 

A 2 • • • 0 

0 ••• A k J 


Here A/ are square matrices of orders n/, i = 1 , . . . , k. Then A is a square matrix of 
order n — n \ H + n^. It is called a block- diagonal matrix. 

It is sometimes convenient to notate matrix multiplication in block form. We shall 
consider only the case of two square matrices of order n , broken into blocks of the 
form ( 2 . 54 ) ail of the same size: 






( 2 . 55 ) 


Here An and B\\ are square matrices of order /?, A 12 and B\2 are matrices of type 
( p , n — /?), A21 and #21 are matrices of type ( n — /?, p ), A22 and B12 are square 
matrices of order n — p. Then the product C = AB is well defined and is a matrix 
of order n that can be broken into the same type of blocks: 
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We claim that in this case, 


C\ \ — A\ \B\\ + A 12 ^ 21 , Ci 2 = An 5 i 2 + A 12 ^ 22 , 

C21 = A21 B\ 1 + A22B21, C22 = A21B12 + A 22 ^22* 


(2.56) 


In other words, the matrices (2.55) are multiplied just like matrices of order 2, 
except that their éléments are not numbers, but blocks, that is, they are themselves 
matrices. The proof of formulas (2.56) follows at once from formulas (2.46). For 
example, let C = (c/y), where 1 < /, j < p. In formula (2.46), the sum of the first 
p terms gives the element cL in the matrix A\\B\\, while the sum of the remaining 
n — p terms gives the éléments c-'- in the matrix A 12 # 21 - Of course, analogous 
formulas hold as well (with the same proof) for the multiplication of rectangular 
matrices with differing décompositions into blocks; it is necessary only that these 
partitions agréé among themselves in such a way that the products of ail matrices 
appearing in the formulas are defined. However, in what follows, only the case (2.55) 
described above will be necessary. 

The transpose operation is connected with multiplication by an important rela- 
tionship. Let the matrix A be of type ( n , m), and matrix B of type ( m,k ). Then 

(AB)* = B*A*. (2.57) 

Indeed, by the définition of matrix product (formula (2.46)), an element of the matrix 
AB standing at the intersection of the j th row and / th column is equal to 

dj\b\i + Üj 2 b 2 i H h cijmbmi, where i = 1, . . . , n, j = 1, . . . , k. (2.58) 

By définition of the transpose, the expression (2.58) gives us the value of the element 
of the matrix (AB)* standing at the intersection of the / th row and the yth column. 
On the other hand, let us consider the product of matrices B* and A*, using the 
rule “row times column” formulated above. Then, taking into account the définition 
of the transpose, we obtain that the element of the matrix B* A* standing at the 
intersection of the /th row and j th column is equal to the product of the /th column 
of the matrix B and the y th row of the matrix A, that is, equal to 


b\iClj[ + b2idj2 + • • • + b mi djm 


This expression coincides with the formula (2.58) for the element of the matrix 
(A B)* standing at the corresponding place, and this establishes equality (2.57). 

It is possible to express, using the operation of multiplication, the elementary 
transformations of matrices that we used in Sect. 1.2 in studying Systems of linear 
équations. Without specifying this especially, we shall continue to keep in mind that 
we are always multiplying matrices whose product is well defined. 

Suppose that we are given a rectangular matrix 
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/ an 

a\2 

&\n ^ 


a 22 

• ‘ ‘ a2n 

\&m 1 

&m2 

<2mn / 


Let us consider a square matrix of order m obtained from the identity matrix of order 
m by interchanging the zth and jth rows: 



0 


0 


\ 


0 


0 

0 


\ 


j 

i 

0 ••• 0 1 

1 0 


0 




An easy check shows that TnA is also obtained from A by transposing the zth and 
y th rows. Therefore, we can express an elementary operation of type I on a matrix 
A by multiplication on the left by a suitable matrix Tn. 

Let us consider (for i ^ j) a square matrix U \j (c) of order m depending on the 
number c : 



0 




0 



0 


j 

i ; i 

1 0 • • • 0 c 

0 1 0 



0 


1 

0 


0 

1 


1 


0 


0 




\ 


0 



(2.59) 
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It is obtained from the identity matrix of order m by adding the / th row multiplied 
by c to the i th row. An equally easy vérification shows that the matrix Uij(c)A is 
obtained from A by adding the j th row multiplied by the number c to the / th row. 
Therefore, we can also write an elementary operation of type II in terms of matrix 
multiplication. Consequently, Theorem 1.15 in matrix form can be expressed as 
follows: 

Theorem 2.53 An arbitrary matrix A of type (m, n) can be brought into échelon 
form by multiplying on the left by the product of a number of s uitable matrices Tij 
and Uij(c) (in the proper order). 

Let us examine the important case in which A and B are square matrices of 
order n. Then their product C = AB is also a square matrix of order n. 

Theorem 2.54 The déterminant of the product oftwo square matrices of identical 
orders is equal to the product of their déterminants. That is , \AB \ — \A\ • \B\. 

P roof Let us consider the déterminant \AB\ for a fixed matrix B as a function, 
which we dénoté by Z 7 (A), of the rows of the matrix A. We shall prove first that 
the function F (A) is multilinear. We know (by Property 2.4 from Sect. 2.2) that 
the déterminant |C| = F(A), considered as a function of the rows of the matrix 
C — AB, is multilinear. In particular, it is a linear function of the i th row of the 
matrix C, that is, 

F (A) = or i en + oi 2 Ci 2 H h oi n C[ n (2.60) 

for some numbers oq, . . . , a n . Let us focus attention on the fact that according to 
formula (2.46), the i th row of the matrix C — AB dépends only on the i th row of 
the matrix A, while the remaining rows of the matrix C, in contrast, do not dépend 
on this row. After substituting into formula (2.60) the expressions (2.46) for the él- 
éments of the i th row and collecting like terms, we obtain an expression for F(A) 
as a linear function of the i th row of the matrix A. Therefore, the function F (A) is 
multilinear in the rows of A. Now let us transpose two rows of the matrix A, say 
with indices i \ and z' 2 . Formula (2.46) shows us that the /th row of the matrix C 
for / 7 ^/ 1 , z 2 does not change, but its z 1 th and z’ 2 th rows exchange places. Therefore, 
|C| changes sign. This means that the function F (A) is antisymmetric with respect 
to the rows of the matrix A. We can apply to this function Theorem 2.15, and we 
then obtain that F (A) = k\A\, where k — F (F) = \EB\ — \B\, since for an arbi- 
trary matrix B , the relationship EB — B is satisfied. We thereby obtain the equality 
F (A) = |A| • |F|, whence according to our définition, F (A) = \AB\. □ 

Theorem 2.54 has a beautiful generalization to rectangular matrices known as 
the Cciuchy-Binet identity. We shall not prove it at présent, but shall give only its 
formulation (a natural proof will be given in Sect. 10.5 on p. 377). 

The product of two rectangular matrices B and A results in a square matrix of 
order m if F is of type (m, n ), and A is of type ( n,m ). The minors of the matrices B 
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and A of the same order equal to the lesser of n and m are called associâtes if they 
stand in the columns (of matrix B) and rows (of matrix A) with the same indices. 
The Cauchy-Binet identity asserts that the déterminant \BA\ is equal to 0 if n <m, 
and \BA\ is equal to the sum of the associated minors of order m if n > m. In this 
case, the sum is taken over ail collections of rows (of matrix A) and columns (of 
matrix B) with increasing indices i\ < h < m mm < i m - 

We hâve a beautiful spécial case of the Cauchy-Binet identity when 

( a\ b A 

Ü2 b2 

• • 

• • 

• • 

\a n b n J 

Then 

B A / a^ -\- a 2 a ^ a\b\ H- $2^2 "T • • • H - ci n bn 

ya\b\ + ^2^2 + • • • + a nb n b^ + ^2 ■ * * ~\~ 

and the associated minors assume the form 



B = 


a\ ü 2 

b\ b 2 



A — 


ai b[ 
a j b j 

for ail i < y, taking values from 1 t on. The Cauchy-Binet identity gives us the 
equality 

(#1 + ^2 T • • • + afy b 2 b„) — (a\b\ H- $ 2^2 + * * * T a n b n )^ 

= ^( ajbj — ajbi) 2 . 
i<j 

In particular, we dérivé from it the well-known inequality 

( a j + ^2 + • • • + cfy (b\ H- /?2 “b ‘ * T tfi) — ( a \b\ H- ^ 2^2 H - • • * + d n b n )^ • 

The operations of addition and multiplication of matrices make it possible to 
define polynomials in matrices. In this we shall of course assume that we are always 
speaking about square matrices of a certain fixed order. We shall first define the 
operation of exponentiation , namely raising a matrix to the nth power. By définition, 
A n for n > 0 is the resuit of multiplying the matrix A by itself n times, while for 
n — 0, the resuit will be the identity matrix E. 

Définition 2.55 Let f(x) = «o + a\x + • • • + a^x k be a polynomial with numeric 
coefficients. Then a matrix polynomial f for a matrix A is the matrix 

f (A) — œqE + oq A + • • • + oifc A k . 

Let us establish some simple properties of matrix polynomials. 
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Lemma 2.56 If f(x) + g(x) = u(x) and f(x)g(x) — v(x ), thenfor an arbitrary 
square matrix A we hâve 


f(A) + g(A) = u(A), (2.61) 

f(A)g(A) = v(A). (2.62) 

Proof Let f(x ) = a i x> an d g( x ) — J2"j=o P i* 1 ■ Then u(x) = y r x r and 
V(X) = Y. S 8 s x s , where the coefficients y r and 8 S can be written in the form 

Yr =(*r + f > r , &s = ^ Œ ' ^ s ~ l ’ 

i = 0 

where a r = 0 if r > n, and = 0 if r > m. The equality (2.61) is now perfectly 
obvious. For the proof of (2.62), we observe that 

n n 

f (A) g (A) = J2 a i A * =J2 0t ‘Pj A ‘ +j - 

i= 1 7 = 1 ij 

Collecting ail terms for which i + j = s, we obtain formula (2.62). □ 

Corollary 2.57 77ze polynomials f (A) and g (A) for the same matrix A commute : 
f(A)g(A) = g(A)f(A). 

Proof The resuit follows from formula (2.62) and the equality f(x)g{x) — 
g(x)f(x). □ 

Let us observe that the analogous assertion to the lemma just proved is not true for 
polynomials in several variables. For example, the identity (x -b y)(x — y) — x 2 — y 2 
will not be preserved in general if we replace x and y with arbitrary matrices. The 
reason for this is that the identity dépends on the relationship xy = yx, which does 
not hold for arbitrary matrices. 


2.10 Inverse Matrices 

In this section we shall consider exclusively square matrices of a given order n. 

Définition 2.58 A matrix B is called the inverse of the matrix A if 

AB = E. (2.63) 

Here E dénotés the identity matrix of the fixed order n . 
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Not every matrix has an inverse. Indeed, applying Theorem 2.54 on the détermi- 
nant of a matrix product to equality (2.63), we obtain 

|£| = \AB\ = |A| • |J5|, 

and since |£| = 1, then we must hâve \A\ • \B\ = 1. Clearly, such a relationship 
is impossible if \A\ =0. Therefore, no singular matrix can hâve an inverse. The 
following theorem shows that the converse of this statement is also true. 

Theorem 2.59 For every nonsingular matrix A there exists a matrix B satisfying 
the relationship (2.63). 

P roof Let us dénoté the yet unknown j th column of the desired inverse matrix B by 
[b]j, while [e]j will dénoté the j th column of the identity matrix E. The columns 
[b]j and [e]j are matrices of type ( n , 1 ), and by the product rule for matrices, the 
equality (2.63) is équivalent to the n relationships 

A[b]j = [e]j, j = 1, ... ,n. (2.64) 

Therefore, it suffices to prove the solvability of each (for each fixed j) System of 
linear équations (2.64) for the n unknowns that are the éléments of the matrix B 
appearing in column [b] j. But for every index y, the matrix of this System is A, and 
by hypothesis, \A\ j^O.By Theorem 2.12, such a System has a solution (and indeed, 
a unique one). Taking the solution of the System obtained for each index j as the 
y th column of the matrix B , we obtain a matrix satisfying the condition (2.63), that 
is, we hâve found an inverse to the matrix A. □ 

Let us recall that matrix multiplication is not commutative, that is, in general, 
AB B A. Therefore, it would be natural to consider another possible définition of 
the inverse matrix of A, namely a matrix C such that 

CA = E . (2.65) 

The same reasoning as that carried out at the beginning of this section shows that 
such a matrix C does not exist if A is singular. 

Theorem 2.60 For an arbitrary nonsingular matrix A, there exists a matrix C sat- 
isfying relationship (2.65). 

Proof This theorem can be proved in two different ways. First, it would be possible 
to repeat in full the proof of Theorem 2.59, considering now instead of the columns 
of the matrices C and E , their rows. But perhaps there is a somewhat more élégant 
proof that dérivés Theorem 2.60 directly from Theorem 2.59. To this end, let us 
apply Theorem 2.59 to the transpose matrix A*. By Theorem 2.32, |A*| = |A|, and 
therefore, |A*| 7 ^ 0, which means that there exists a matrix B such that 


A*B = E. 


( 2 . 66 ) 
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Let us apply the transpose operation to both sides of (2.66). It is clear that E* = E. 
On the other hand, by (2.57), 


(A*fi)* = fi*(A*)*, 

and it is easily verified that (A*)* = A. We therefore obtain B* A = E, and in (2.65) 
we can take the matrix Æ* for C, where B is defined by (2.66). □ 

The matrices B from (2.63) and C from (2.65) can make equal claim to the title of 
inverse of the matrix A. Fortunately, we do not obtain here two different définitions 
of the inverse, since these two matrices coincide. Namely, we hâve the following 
resuit. 

Theorem 2.61 For any nonsingular matrix A there exists a unique matrix B sat- 
isfying (2.63) and a unique matrix C satisfying (2.65). Moreover, the two matrices 
are equal. 

Proof Let A be a nonsingular matrix. We shall show that the matrix B satisfy- 
ing (2.63) is unique. Let us assume that there exists another matrix, B' , such that 
AB' — E. Then AB — AB' , and if we multiply both sides of this equality by the 
matrix C such that CA = E, whose existence is guaranteed by Theorem 2.60, then 
by the associative property of matrix multiplication, we obtain (CA) B — (CA) B', 
whence follows the equality EB — EB' , that is, B — B' . In exactly the same way 
we can prove the uniqueness of C satisfying (2.65). 

Now let us show that B — C. To this end, we consider the product C(AB) and 
make use of the associative property of multiplication: 

C(AB) — (CA)B. (2.67) 

Then on the one hand, AB — E and C(AB) — CE — C, while on the other hand, 
CA — E and (CA) B — EB — B, and relationship (2.67) gives us B — C. □ 

This unique (by Theorem 2.61) matrix B — C is denoted by A -1 and is called 
the inverse of the matrix A. Thus for every nonsingular matrix A, there exists an 
inverse matrix A -1 satisfying the relationship 

AA~ l = A~ [ A = E, (2.68) 

and such a matrix A -1 is unique. 

In following the proof of Theorem 2.59, we see that it is possible to dérivé an 
explicit formula for the inverse matrix. We again assume that the matrix A is non- 
singular, and following the notation used in the proof of Theorem 2.59, we arrive at 
the System of équations (2.64). Since \A\ 0, we can find a solution of this System 

using Cramer’s rule (2.35). For an arbitrary index j = 1, . . . , n in System (2.64), the 
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/ th unknown coincides with the element b\j of the matrix B. Using Cramer’s rule, 
we obtain for it the value 



(2.69) 


where Djj is the déterminant of the matrix obtained from A by replacing the / th 
column by the column [e]j. The déterminant Djj can be expanded along the / th 
column, and by formula (2.30), we obtain that it is equal to the cofactor of the 
unique nonzero (and equal to 1) element of the /th column. Since the /th column is 
equal to [e] ; , there is a 1 at the intersection of the / th column (which we replaced 
by [e]j) and the y th row. Therefore, Djj — Ajj, and formula (2.69) yields 




This is an explicit formula for the éléments of the inverse matrix. In words, this can 
be formulated thus: to obtain the inverse matrix of a nonsingular matrix A, one must 
replace every element with its cofactor, then transpose the matrix thus obtained and 
multiply it by the number | A | ~ 1 . 

For example, for the 2x2 matrix 




with 8 = | A \ = ad — bc ^ 0, we obtain the inverse matrix 

A-l -(<*/* ~ b / 8 \ 

\-c/8 a/8 J • 

The concept of inverse matrix provides a simple and élégant notation for the 
solution of a System of n équations in n unknowns. If, as in the previous section, 
we write down the System of linear équations (1.3) with n —m and A a nonsingular 
matrix in the form A[x ] = [b], where [x] is the column of unknowns x \, . . . , x n 
and [b] is the column consisting of the constants of the System, then multiplying 
this relationship on the left by the matrix A -1 , we obtain the solution in the form 
[x] = A~ [ [b\. Thus, in matrix notation, the formulas for the solution of a System 
of n linear équations in n unknowns look just like those for a single équation in 
a single unknown. But if we use the formulas for the inverse matrix, then we see 
that the relationship [x] — A~ l [b] exactly coincides with Cramer’s rule, so that this 
more élégant notation gives us nothing essentially new. 

Let us consider the matrix A = (âjj), in which the element âjj = Ajj is the 
cofactor of the element ajj of the matrix A. The matrix A is called the adjugate 
matrix to A. For a matrix A of order n, the éléments of the adjugate matrix are 
polynomials of degree n — 1 in the éléments of A. Formula (2.69) for the inverse 
matrix shows that 


AA = AA = \ A\E. 


(2.70) 
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The advantage of the adjugate matrix A compared to the inverse matrix A~ { is that 
the définition of A does not require division by |A|, and formula (2.70), in contrast 
to the analogous formula (2.68), holds even for \A\ =0, that is, even for singular 
square matrices, as the proof of Cramer’s rule demonstrates. We shall make use of 
this fact in the sequel. 

In conclusion, let us return once more to the question of presenting elementary 
operations in terms of matrix multiplication, which we began to examine in the 
previous section. It is easy to see that the matrices Tij and Ujj(c) introduced there 
are nonsingular, and moreover, 

T{j l = Tji, Ur.\c) = U ij (-c). 

Therefore, Theorem 2.53 can be reformulated as follows: An arbitrary matrix A can 
be obtained from a particular échelon matrix A' by multiplying it on the left by 
matrices Tij and Uij(c) in a certain order. 

Let us apply this resuit to nonsingular square matrices of order n . Since | Tn | ^ 0, 

| U ij (c) | 7 ^ 0, and \A\ ^ 0 (by assumption), the matrix A' must also be nonsingular. 
But a nonsingular square échelon matrix is in upper triangular form, that is, ail of 
its éléments below the main diagonal are equal to zéro, namely, 


/a 'n 

a \2 

a '\3 

••• <n\ 

0 

a 22 

«23 

• • • a 2n 

0 

0 

a 33 

' ' ' a 2„ 

0 

0 

0 

‘ ' ‘ a nJ 


and moreover, \A'\ = a' n a' 22 • • • a' nn . Therefore, ail the éléments a ' n , . . . , a' nn on the 
main diagonal are different from zéro. 

But this matrix A' can be brought into a yet simpler form with the help of ele- 
mentary operations of type II only. Namely, since a' nn ^ 0, one can subtract from 
the rows with indices n — 1, n — 2, . . . , 1 of the matrix A' the last row multiplied by 
factors that make ail the éléments of the n th column (except for a! nn ) equal to zéro. 
Since a' n _ ln _ l ^ 0, it is possible in the same way to reduce to zéro ail éléments 
of the ( n — l)st column (except for the element ci' n _ [fi _ { ). Doing this n times, we 
shall make ail of the éléments of the matrix equal to zéro except those on the main 
diagonal. That is, we end up with the matrix 



fa \\ 

0 

0 


0 0 

a 22 0 

0 «33 




(2.71) 


A matrix ail of whose éléments are equal to zéro except for those on the main 
diagonal is called a diagonal matrix. We hâve thus proved that a matrix A' can be 
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obtained from a diagonal matrix D by multiplying it on the left by matrices of the 
form Tij and Ujj(c) in some order. 

Let us note that multiplication by a matrix 7)y (that is, an elementary operation 
of type I) can be replaced by multiplication on the left by matrices of type U[j (c) 
for various c and by a certain simpler matrix. Namely, the interchange of the / th and 
/ th rows can be obtained using the following four operations: 

1. Addition of the /th row to the yth row. 

2. Subtraction of the yth row from the / th row. 

3. Addition of the /th row to the yth row. 

Schematically, this can be depicted as follows, where the /th and yth rows are 
denoted by c/ and c / : 

“ 'l .2. ( ~ c > ) .2. f'-'A 

\CjJ \Ci+CjJ \Ci+Cj) V Ci J 


4. It is now necessary to introduce a new type of operation: its effect is to multiply 
the / th row by — 1 and is achieved by multiplying (with k = i ) our matrix on the 
left by the square matrix 





\ 


\ 



(2.72) 


where there is — 1 at the intersection of the kth row and kth column. 
We may now reformulate Theorem 2.53 as follows: 


Theorem 2.62 Any nonsingular matrix can be obtained from a diagonal matrix by 
multiplying it on the left by certain matrices t//y(c) of the form (2.59) and matrices 
Sk of the form (2.72). 


We shall use this resuit in Sect. 4.4 when we introduce the orientation of a real 
vector space. Furthermore, Theorem 2.62 provides a simple and convenient method 
of computing the inverse matrix, in a manner based on Gaussian élimination. To this 
end, we introduce yet another (a third) type of elementary matrix operation, which 
consists in multiplying the /cth row of a matrix by an arbitrary nonzero number a. 
It is clear that the resuit of such an operation can be obtained by multiplying our 
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matrix on the left by the square matrix 



V k (a) = 


0 

i ; 


a 


1 



\ 


( 2 . 73 ) 


\ 



where the number a stands at the intersection of the kth row and kth column. By 
multiplying the matrix (2.71) on the left by the matrices V\ (a'\ \ ),..., V n ( a 'ün )» we 
transform it into the identity matrix. 

From Theorem 2.62, it follows that every nonsingular matrix can be obtained 
from the identity matrix by multiplying it on the left by matrices Ujj(c) of the type 
given in (2.59), matrices Sjç from (2.72), and matrices Vk(ot) of the form of (2.73). 
However, silice multiplication by each of these matrices is équivalent to an elemen- 
tary operation of one of the three types, this means that every nonsingular matrix 
can be obtained from the identity matrix using a sequence of such operations, and 
conversely, using a certain number of elementary operations of ail three types, it is 
possible to obtain the identity from an arbitrary nonsingular matrix. This gives us 
a convenient method of computing the inverse matrix. Indeed, suppose that using 
some sequence of elementary operations of ail three types, we hâve transformed 
matrix A to the identity matrix E. Let us dénoté by B the product of ail the matrices 
Uij(c ), Sk, and V&(o '), whose product corresponds to the given operations (in the 
obvious order: the matrix representing each successive operation stands to the left 
of the previous one). Then B A — E , from which it follows that B — A -1 . Then af- 
ter applying the same sequence of elementary operations to the matrix E , we obtain 
from it the matrix BE — B, that is, A -1 . Therefore, to compute A -1 , it suffices to 
transform the matrix A to E using elementary operations of the three types (as was 
shown above), while simultaneously applying the same operations to the matrix E. 
The matrix obtained from E as a resuit of the same elementary operations will be 
A" 1 . 

Let C be an arbitrary matrix of type ( m,n ). We shall show that for an arbitrary 
nonsingular square matrix A of order m, the rank of the product AC is equal to 
the rank of C. Indeed, as we hâve already seen, the matrix A can be transformed 
into E by applying some sequence of elementary operations of the three types to its 
rows, to which corresponds multiplication on the left by the matrix A -1 . Applying 
the same sequence of operations to AC, we clearly obtain the matrix A -1 AC = C. 
B y Theorem 2.37, the rank of a matrix is not changed by elementary operations 
of types I and II. It also does not change under elementary operations of type III. 
This clearly follows from the fact that every minor is a linear function of its rows, 
and consequently, every nonzero minor of a matrix remains a nonzero minor after 
multiplication of any of its rows by an arbitrary nonzero number. Therefore, the rank 
of the matrix AC is equal to the rank of C. 
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Using an analogous argument for the columns as was given for the rows, or sim- 
ply using Theorem 2.36, we obtain the following useful resuit. 

Theorem 2.63 For any matrix C of type (m, n) and any nonsingular square matri- 
ces A and B oforders m and n , the rank of AC B is equal to the rank of C. 


Chapter 3 

Yector Spaces 


3.1 The Définition of a Vector Space 

Vectors on a line, in the plane, or in space play a significant rôle in mathematics, and 
especially in physics. Vectors represent the displacement of bodies, or their speed, 
accélération, or the force applied to them, among many other things. 

In a course in elementary mathematics or physics, a vector is defined as a di- 
rected line segment. The word directed indicates that a direction is assigned to the 
segment, often indicated by an arrow drawn above it. Or else, perhaps, one of the 
two endpoints of the segment [A, B], say A, is called the beginning , while the other, 
B , is the end , and then the direction is given as motion from the beginning of the 

segment to the end. Then two vectors x — AB and y — CD are said to be equal if 
it is possible by means of parallel translation to join the segments x and y in such a 
way that the beginning A of segment x coincides with the beginning C of segment 
y (in which case their ends must coincide as well); see Fig. 3.1. 

The fact that we consider the two different vectors in the figure to be equal 
does not represent anything unusual in mathematics or generally in human thought. 
Rather, it represents the usual method of abstraction , whereby we focus our atten- 
tion on some important property of the objects under considération. Thus in ge- 
ometry, we consider certain triangles to be equal, even though they are drawn on 
different sheets of paper. Or in arithmetic, we might consider equal the number of 
people in a boat and the number of apples on a tree. 

It is obvious that having chosen a certain point O (on a line, in the plane, or in 
space), we can find a vector (indeed the unique one) equal to a given vector x whose 
beginning coincides with the point O . 

The laws of addition of velocities, accélérations, and forces lead to the following 

définition of vector addition. The sum of vectors x — AB and y — CD is the vector 
— > — > 

z = AD' , where D' is the end of vector BD' , a vector equal to y whose beginning 
coincides with the end B of the vector x; see Fig. 3.2. 

If we replace ail of these vectors with equal vectors but having as their beginning 
the fixed point O, then vector addition will proceed by the well-known “parallelo- 
gram law”; see Fig. 3.3. 
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Fig. 3.1 Equal vectors 


Fig. 3.2 Vector summation 


B 




Fig. 3.3 The parallelogram 
law 


D 



There is also a définition of multiplication of a vector x by a number a. For now, 
in speaking about numbers, we shall mean real numbers (we shall hâve something 
to say later about the more general situation). If a > 0 and x is the vector A B, then 

the product ax is defined to be the vector AC lying on the same line as [A, B] in 
such a way that the point C lies on the same side of A as the point B and such 
that the segment [A, C] is a times the length of the segment [A, B]. (Note that if 
a < 1, then the segment [A, C] is shorter than the segment [A, B].) Denoting by 
\AB\ the length of the segment [A, B], we shall express this by way of the formula 
|AC| =ot\AB\. However, if a < 0 and a = — /3, where then P > 0, then the product 

ax is defined to be the vector CA, where ^x — AC. 

We shall not dérivé the simple properties of vector addition and multiplication of 
a vector by a number. We observe only that they are amazingly similar for vectors on 
a line, in the plane, and in space. This similarity indicates that we are dealing only 
with a spécial case of a general concept. In this and several subséquent chapters, 
we shall présent the theory of vectors and the spaces consisting of them of arbi- 
trary dimension n (including even some facts relating to spaces whose dimension is 
infinité). 

How do we formulate such a définition? In the case of vectors on a line, in the 
plane, and in space, we shall use the intuitively clear concept of directed line seg- 
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ment. But what if we are not convinced that our interlocutor shares the same intu- 
ition? For example, suppose we wanted to share our knowledge with an extraterres- 
trial with whom we are communicating by radio? 

A technique was long ago devised for overcoming such difficultés in the sci- 
ences. It involves defining (or in our terminology, reporting to the extraterrestrial) 
not what are the objects under considération (vectors, etc.), but the relationships be- 
tween them , or in other words, their properties. For example, in geometry, one leaves 
undefined such notions as point, line, and the property of a line passing through a 
point, and instead formulâtes some of their properties, for instance that between two 
distinct points there passes one and only one line. Such a method of defining new 
concepts is called axiomatic. In this course on linear algebra, the vector space will 
be the first object to be defined axiomatically. Till now, new concepts hâve been 
defined using constructions or formulas, such as the définition of the déterminant 
of a matrix (defined either inductively, using the rule of expansion by columns, or 
derived using the rather complicated explicit formula (2.44) from Sect. 2.7). It is, 
however, possible that the reader has encountered the concepts of groups and fields, 
which are also defined axiomatically, but may not hâve investigated them in detail, 
in contrast to the notion of a vector space, the study of which will occupy this entire 
chapter. 

With that, we move on to the définition of a vector space. 

Définition 3.1 A vector (or linear) space is a set L (whose éléments we shall call 
vectors and dénoté by x, y, z, etc.) for which the following conditions are satisfied: 

(1) There is a rule for associating with any two vectors x and y a third vector, called 
their sum and denoted by x + y. 

(2) There is a rule for associating with any vector x and any number a a new vector, 
called the product of a and x and denoted by ax . (The numbers a by which a 
vector can be multiplied, be they real, complex, or from any field K, are called 
sc alors.) 

These operations must satisfy the following conditions: 

(a) x + y = y + x. 

(b) (x -b y) -b z = x + (y -b z). 

(c) There exists a vector 0 g L such that for an arbitrary vector x g L, the sum x -b 0 
is equal to x (the vector 0 is called the null vector). 

(d) For each vector x g L, there exists a vector — x g L such that x -b (— x) = 0 (the 
vectors x and — x are called additive inverses or opposites of each other). 1 

(e) For an arbitrary scalar a and vectors x and y, 

a(x + y) — oix 4- oty. 


1 Readers who are familiar with the concept of a group will be able to reformulate conditions (a)- 
(c) in a compact way by saying that with respect to the operation of vector addition, the vectors 
form an abelian group. 
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(f) For arbitrary scalars a and /3 and vector x, 

(a + fï)x —ax-\- fix. 


(g) Similarly, 


(h) For an arbitrary vector x, 


a(px) — (a/3)x. 


lx = x and Ox = 0. 

In the last equality, the 0 on the right-hand side dénotés the null vector of the space 
L, while the 0 on the left is the scalar zéro (these will always be so denoted using 
lighter and heavier type). 

It is easy to prove that there is a unique null vector in L. Indeed, if there were 
another null vector 0 \ then by définition, we would hâve the equality 0' = 0' + 0 = 0, 
from which it follows that O 7 = 0. 

Using properties (a) through (d) and the uniqueness of the null vector, it is easily 
proved that for an arbitrary x, there is a unique additive inverse vector — x in L. 

It follows from properties (g) and (h) that the vector — x is obtained by multiply- 
ing the vector x by the scalar — 1 . Indeed, since 


x + (— l)x = lx + (— l)x = (l + (— l))x = Ox = 0 , 


we obtain by the uniqueness of the additive inverse that (— l)x = — x. Analogously, 
from properties (f) and (h), it follows that for every vector x and natural number k, 
the vector kx is equal to the k-fold sum x + • • • + x. 

Remark 3.2 (On scalars and fields) We would like to make more précisé what we 
mean by scalars a, j3, etc. in the définition of vector space above. The majority of 
readers will probably assume that we are talking about real numbers. In this case, L 
is called a real vector space. But those who are familiar with complex numbers may 
choose to understand the scalars a, /3, etc., as complex. In that case, L will be called 
a complex vector space. The theory developed below will be applicable in this case 
as well. Finally, the reader familiar with the concept oïfield may combine these two 
cases, understanding the scalars invol ved in the définition of a vector space to be 
éléments of any field K. Then L will be called a vector space over the field K. 

Strictly speaking, this question of scalars could hâve been addressed in the pre- 
ceding chapters in which we discussed numbers without going into much detail. The 
answer would hâve been the same: by scalars, one may understand real numbers, 
complex numbers, or the éléments of any field. Ail of our arguments apply equally 
to ail three cases. The only exception is the proof of Property 2.10 from Sect. 2.2, in 
which we used the fact that from the equality 2D = 0 it folio wed that D — 0. A field 
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in which that assertion is true for every element D is called a field of characteristic 2 
different from 2. Nonetheless, it is possible to prove that Property 2.10 holds in the 
general case as well. 

Example 3.3 We présent here a few examples of vector spaces. 

(a) The set of vectors on a line, in the plane, or in space as we hâve previously 
discussed. 

(b) In Sect. 2.9, we introduced the notions of addition of matrices and multiplication 
of a matrix by a number. It is easily verified that the set of matrices of a given 
type (m, n) with operations thus defined is a vector space. That conditions (a) 
through (h) are satisfied reduces to the corresponding properties of numbers. In 
particular, the set of rows (or columns) of a given length n is a vector space. 
We shall dénoté this space by W 1 if the row (or column) éléments belong to the 
field K. Here it is understood that if we are operating with real numbers only, 
then K — M, and the field will then be denoted by W 1 . If we are using complex 
numbers, then K. = C, and the vector space will be denoted by C”. The reader 
may choose any of these désignations. 

(c) Let L be the set of ail continuous functions defined on a given interval [ a , b] 
taking real or complex values. We define addition of such functions and multi- 
plication by a scalar in the usual way. It is then clear that L is a vector space. 

(d) Let L be the set of ail polynomials (of arbitrary degree) with real or complex 
coefficients or coefficients in a field K. Addition and multiplication by a scalar 
are defined as usual. Then it is obvious that L is a vector space. 

(e) Let L be the collection of ail polynomials whose degree does not exceed a fixed 
number n. Everything else is the same as in the previous example. We again 
obtain a vector space (one for each value of n). 

Définition 3.4 A subset L of a vector space L is called a subspace of L if for arbi- 
trary vectors x, y e L, their sum x + y is also in L, and for an arbitrary scalar a 
and vector x g L, the vector ax is in L. 

It is obvious that V! is itself a vector space. 

Example 3.5 The space L is a subspace of itself. 

Example 3.6 The vector 0 by itself forms a subspace. It is called the zéro space and 
is denoted by (O). 3 


2 For readers familiar with the définition of a field, we can give a general définition: The character- 
istic of a field K. is the smallest natural number k such that the &-fold sum kD = DH h D is 

equal to 0 for every element D e IK (as is easily seen, this number k is the same for ail D ^ 0). If 
no such natural number k exists (as in, for example, the most frequently encountered fields, K = R 
and K = C), then the characteristic is defined to be zéro. 

3 Translatons note: It may be tempting to consider “null space” a possible synonym for the zéro 
space. However, that term is reserved as a synonym for “kernel,” to be introduced below, in Défi- 
nition 3.67. 
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Example 3 . 7 Consider the space encountered in analytic geometry consisting of ail 
vectors having their beginning at a certain fixed point O. Then an arbitrary line 
and an arbitrary plane passing through the point O will be subspaces of the entire 
enclosing vector space. 

Example 3.8 Consider a System of homogeneous linear équations in n unknowns 
with coefficients in the field K. Then the set of rows forming the solution set is a 
subspace L' of the space K 11 of rows of length n. This follows from the notation 
(1.10) of such a System (with b[ — 0) and properties (1.8) and (1.9) of linear func- 
tions. The subspace L' is called the solution subspace of the associated System of 
homogeneous linear équations. The équations of the System détermine the subspace 
L' just as the équation of a line or plane does in analytic geometry. 

Example 3.9 In the space of ail polynomials, the collection of ail polynomials with 
degree at most n (for any fixed number n) is a subspace. 

Définition 3.10 A space L is called the sum of a collection of its subspaces 
l_i, l_ 2 , . . . , L& if every vector x e L can be written in the form 

x — x\-\-X2~\ \-xjc, wherex/eL,-. (3.1) 

In that case, we write 


L = Li + l_2 H h U. 


Définition 3.11 A space L is called the direct sum of its subspaces Li , l_ 2 , . . . , L& if 
it is the sum of these subspaces and in addition, for every vector x g L, the repré- 
sentation (3.1) is unique. In this case, we write 

L=Li ©L 2 ©---0U. (3.2) 

Example 3. 12 The space that we considered in Example 3.7 is the sum of two planes 
if they do not coincide; it is the sum of a line and plane if the line is not contained 
in the given plane; it is the sum of three fines if they do not belong to a common 
plane. In the second and third cases, the sum will be a direct sum. In the case of 
two planes, it is easily seen that the représentation (3.1) is not unique. For example, 
we can represent the null vector as a sum of two vectors that are additive inverses 
of each other lying on the line that is obtained as the intersection of the two given 
planes. 

Example 3.13 Let us dénoté by L/ the vector space consisting of ail monomials of 
degree i . Then the space L of polynomials of degree at most n can be represented as 
the direct sum L = l_o 0 U 0 • • • 0 L n . This follows from the fact that an arbitrary 
polynomial is uniquely determined by its coefficients. 
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Lemma 3.14 Suppose the vector space L is the sum of certain of its subspaces 
Li, l_ 2 , . . . , \-k- Then in order for L to be a direct sum ofthese subspaces , it is neces- 
sary and sufficient that the relations hip 

x \ T - x 2 H - ■ * * H - x k — x i G L/ , (3.3) 

hold only if ail the Xj are equal to 0 . 

P roof The necessity of condition (3.3) is clear, since for the vector 0 G L, the equal- 
ity 0 = 0 + • • • + 0 , in which the null vector of the subspace L / stands in the ith 
place, is a représentation of type (3.1), and the presence of another equality of the 
form (3.3) would contradict the définition of direct sum. To prove the sufficiency of 
the condition (3.3), if there are two représentations (3.1), 

X — X\-\-X2~\ \~ x k , x ~ J 1 +J 2 + Fjjfc» 

then it suffices to subtract one from the other and again use the définition of direct 
sum. □ 

We observe that if l_i, L 2 , . . . , \-k are subspaces of a vector space L, then their 
intersection l_i D L 2 H • • • D is also a subspace of L, since it satisfies ail the re- 
quirements in the définition of subspace. In the case k — 2, then Lemma 3.14 allows 
us to obtain in the following corollary another, more graphie, criterion for the sum 
of subspaces to be a direct sum. 

Corollary 3.15 Suppose the vector space L is the sum of two of its subspaces l_i 
and l_ 2 . Then in order that L be a direct sum , it is necessary and sufficient that one 
hâve the equality l_i D L 2 = (0). 

Proof By Lemma 3.14, L is the direct sum of its subspaces L\ and L 2 if and only if 
the équation x\ + X 2 = 0, where X[ g l_i and X 2 G L 2 , is satisfied only if x\ =0 and 
X 2 = 0. But from x\ + j ^2 = 0, it follows that the vector xi = —X 2 is contained in 
both subspaces Li and L 2 , whence it follows that it is contained in the intersection 
U n l_ 2 . Therefore, the condition L = Li ® L 2 is équivalent to the satisfaction of the 
two conditions L = L_i + l _2 and Lj (T L 2 = (0), which complétés the proof. □ 

We observe that the last assertion cannot be generalized to an arbitrary number 
of subspaces l_i, . . . , L*. For example, suppose that L is the plane consisting of ail 
vectors with origin at O, and suppose that Li, L 2 , L 3 are three distinct fines in this 
plane passing through O . It is clear that the intersection of any two of these fines 
consists of only the zéro vector, and so a fortiori, Li Pi L 2 fl L 3 = (0). The plane L 
is the sum of its subspaces Li , L 2 , L 3 , but it is not the direct sum, since it is obvious 
that one can produce the equality x\ + X 2 + *3 = 0 for nonnull vectors Xj G L / . 

It is easy to see that if equality (3.2) is satisfied, then there exists a bijection 
between the set of vectors x g L and the set l_i x • • • x L&, the product of the sets 
l_i , . . . , L* (see the définition on page xvi). This observation provides a method for 
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constructing the direct sum of vector spaces that are not, so to speak, originally 
subspaces of a larger enclosing space and even hâve perhaps completely different 
structures from one another. 

Let l_i , . . . , Lfc be vector spaces. Just as for any other sets, we can define their 
product L = l_i x • • • x Ljt, which in this case is not yet a vector space. However, it is 
easy to make it into one by defining the sum and the product by a scalar according 
to the following formulas: 

(*i, ■ xk) + (y\, y k ) — (*i + Ji- xk + y k ), 

u(xi,...,x k ) = (ax ax k ), 

for ail vectors X; e L/, y i e L /, / = 1, . . . , k, and an arbitrary scalar a. 

A simple vérification shows that in this way, the définition of the operation satis- 
fies ail the conditions for the définition of a vector space, and the set L = Li x • • • x L& 
becomes a vector space containing Lj , . . . , among its subspaces. If we wish to be 
technically précisé, then the subspaces of L are not the L, themselves, but the sets 
l = (0) x • • • x L/ x • • • x (0), where L ,• stands in the i th place, with the zéro space 
at ail the remaining places other than L / . However, we shall close our eyes to this 
circumstance, identifying with L, itself. 4 It is clear, then, that condition (3.2) is 
satisfied. Thus, for arbitrary mutually independent vector spaces Li, . . . , L* it is al- 
ways possible to construct a space L containing ail the L z as subspaces that is their 
direct sum; that is, L = Li 0 • • • 0 L*. 

Example 3.16 Let Li be the vector space considered in Example 3.7, that is, the 
physical space that surrounds us, and let L 2 = M be the real line, considered as the 
time axis. Operating as described above, we can define the direct sum L = Li 0 L 2 . 

The vectors of the space L thus constructed are called space-time events and hâve 
the form (x, t), where x e Li is the space comportent, and te L 2 is the time com- 
ponent. For the addition of such vectors, the space components are added among 
themselves (as vectors in physical space, for example, according to the parallelo- 
gram law), while the time components are added to one another (as real numbers). 
Multiplication by a scalar is defined analogously. This space plays an important 
rôle in physics, in particular in the theory of relativity, where it is called Minkowski 
space. We remark that we still need to introduce some additional structure, namely 
a particular quadratic form. We shall return to this question in Sect. 7.7 (see p. 268). 


3.2 Dimension and Basis 

In this section we shall use the notion of linear combination , which in the case of 
a space of rows (or row space) of length n has already been introduced (see the 


4 More precisely, this identification is achieved with the help of the concept of isomorphism of 
vector spaces, which will be introduced below, in Sect. 3.5. 
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définition on p. 57). We shall now repeat that définition practically Verbatim. In 
préparation, we observe that applying repeatedly the operations of vector addition 
and multiplication of a vector by a scalar, we can form more complex expressions, 
such as ot\x\ 0 012 X 2 0 • • • + oi m x m , which, moreover, according to properties (a) 
and (b) of the définition of vector space, do not dépend on the order of terms or the 
arrangement of parenthèses (which is necessary in order that we be able to combine 
not only two vectors, but m of them). 

Définition 3.17 In the vector space L, let a: i , * 2 , . . . , x m be m vectors. A vector y 
is called a linear combination of these m vectors if 

J = « 1 X 1 + 0 - 2*2 H \-a m x m , (3.4) 

for some scalars a i , o? 2 , • • . , oi m . 

The collection of ail vectors that are linear combinations of some given vectors 
x i , X 2 -, . . . , x m , that is, those having the form (3.4) for ail possible oq, 012 , . . . , a m , 
clearly satisfies the définition of a subspace. This subspace is called the linear span 
of the vectors x \ , * 2 , • • • , x m and is denoted by (x 1 , * 2 , ...,x m ). It is clear that 

(xi,X 2 ,--.,x m ) = (X[) 0 (x 2 ) H h (x m ). (3.5) 

Définition 3.18 Vectors X\,X 2 , • • • , x m are called linearly dépendent if there exists 
a linear combination (3.4) equal to 0 not ail of whose coefficients a \ , a? 2 , . • . , ct m are 
equal to zéro. Otherwise, x\, * 2 , . . . , x m are said to be linearly independent. 

Thus vectors x\,X 2 , . . . , x m are linearly dépendent if for some scalars a \ , ot 2 , 

. . . , a m , one has 

a\X\ 4- 012 X 2 H h oi m x m — 0, (3.6) 

with at least one a/ not equal to 0. For example, the vectors x\ and x 2 — —x\ are 
linearly dépendent. Conversely, the vectors jti, * 2 , . . . , x m are linearly independent 
if (3.6) holds only for a\ = 0 C 2 = • • • = oi m = 0. In this case, the sum (3.5) is a direct 
sum, that is, 

(*1, X2, • • • , X m ) = (X[) © (x 2 ) 0 • • • 0 (x m ). 

Here is a useful reformulation: Vectors xi,X 2 , . . . , x m are linearly dépendent if 
and only if one of them is a linear combination of the others. Indeed, if 


*i — Ot [X 1 H + Cti — iXj—i + JtTf _|_1 + * • * + OL m X m , (3.7) 

then we hâve the relationship (3.6) with a\ — — 1. Conversely, if in (3.6), the coeffi- 
cient ol\ is not equal to 0, then if we transfer the term ol{X{ to the right-hand side and 
multiply both sides of the equality by the scalar —olJ X , we obtain a représentation 
of *i as a linear combination x \, . . . , x,_i, x,+i, . . . , x m . 

We are finally in a position to formulate the main définition of this section (and 
perhaps of the entire chapter). 


88 


3 Vector Spaces 


Définition 3.19 The dimension of a vector space L is the largest number of linearly 
independent vectors in the space, if such a number exists. The dimension of a vector 
space is denoted by dim L, and if the greatest number of linearly independent vectors 
is finite, the space L is said to be finite-dimensional. If there is no maximum number 
of linearly independent vectors in L, then the space is said to be infinité -dimensional. 
The dimension of the vector space (0) is by définition equal to zéro. 

Thus the dimension of a vector space is equal to the natural number n if the 
space contains n linearly independent vectors and every set of m vectors for m > n 
is linearly dépendent. A vector space is infinite-dimensional if there is a collection 
of n linearly independent vectors for every natural number n . Employing standard 
terminology, we shall call a space of dimension 1 a line and a space of dimension 2 
a plane. 

Example 3.20 It is well known from elementary geometry (or from a course in 
analytic geometry) that vectors on a line, in the plane, or in the physical space that 
surrounds us form vector spaces of dimension 1, 2, and 3. This is the principal 
intuitive basis of the general définition of dimensionality. 

Example 3.21 The space of ail polynomials in the variable t is clearly infinite- 
dimensional, since for an arbitrary number n, the polynomials . . . , t n ~ [ are 

linearly independent. The space of ail continuous functions on the interval [a , b] is 
a fortiori infinite-dimensional. 

The dimension of a vector space L dépends not only on the set itself whose élé- 
ments are the vectors of L, but also on the field over which it is defined. This will be 
made clear in the following examples. 

Example 3.22 Let L_i be the space whose vectors are the complex numbers, defined 
over the field C. The operations of vector addition and multiplication by a scalar will 
be defined as the usual operations of addition and multiplication of complex num- 
bers. Then it is easily seen from the définition that diml_i = 1. If we now consider 
the vector space l _2 likewise consisting of the complex numbers, but defined over the 
field M, then we obtain dim l _2 = 2. This, as we shall see, follows from the fact that 
every complex number is uniquely defined by a pair of real numbers (its real and 
imaginary parts). The frequently encountered expression “complex plane” implies 
the two-dimensional space l _2 over the field M, while the expression “complex line” 
indicates the one-dimensional space l_i over the field C. 

Example 3.23 Let L be the vector space consisting of the real numbers, but defined 
over the field Q of rational numbers (it is easy to see that ail the conditions for the 
définition of a vector space are satisfied). In this case, in a linear combination (3.4), 
vectors x; and y are real numbers, while ai is a rational number. B y properties of 
sets of numbers proved in a course in real analysis, it follows that the space L is 
infinite-dimensional. Indeed, if the dimension of L were some finite number n, then 
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as we shall prove below, it would imply that there exist numbers x i , . . . , x n e M 
such that an arbitrary y g M could be written as a linear combination (3.4) with 
suitable coefficients a\, ... ,a n from the field Q. But that would imply that the set 
of real numbers is countable, which, as is known from real analysis, is not the case. 

It is obvious that the dimension of a subspace L' of a vector space L cannot be 
greater than the dimension of the entire space L. 

Theorem 3.24 If the dimension of a subspace L r of a vector space L is equal to the 
dimension of L, then the subspace \f is equal to ail of L. 

P roof Suppose dim L ' — dim L — n. Then in L' one could find n linearly independent 
vectors jc i , . . . , x n . If L' 7 ^ L, then in L there would be some vector x £ l! . Since 
dimL — n, it follows that any n + 1 vectors in this space are linearly dépendent. 
In particular, the vectors x\, ... ,x n ,x are linearly dépendent. That is, there is a 
relationship 

Œ[X\ H + a n x n + ax = 0 

with not ail coefficients equal to zéro. If we had a — 0, then this would yield the 
linear dependence of the vectors x 1 , . . . , x n , which are linearly independent by as- 

sumption. This means that a and x — f\X\ + h f n x n , fi — — a~ [ aj, from 

which it follows that x is a linear combination of the vectors x\, ... ,x n .lt clearly 
follows from the définition of a subspace that a linear combination of vectors in L' 
is itself a vector in L ' . Hence we hâve x g L', and L' = L. □ 

If the dimension of a vector space L is finite, dimL = n, and a subspace Le L 
has dimension n — 1, then L' is called a hyperplane in L. 

There is a defect in the définition of dimension given above: it is not effective. 
Theoretically, in order to détermine the dimension of a vector space, it would be 
necessary to look at ail Systems of vectors x\, ... , x m for various m in the space 
and détermine whether each is linearly independent. With such a method, it is not 
so simple to détermine the dimension of the row space of length n or of the space 
of polynomials of degree less than or equal to n . Therefore, we shall investigate the 
notion of dimension in greater detail. 

Définition 3.25 Vectors e\, ... ,e n of a vector space L are called a basis if they 
are linearly independent and every vector in the space L can be written as a linear 
combination of these vectors. 

Thus if e\, ... , e n is a basis of the space L, then for an arbitrary vector x e L 
there exists an expression of the form 

x = ot\e\ -\-a 2 e 2 H \-ot n e n - (3.8) 

Theorem 3.26 For an arbitrary vector x, the expression (3.8) is unique. 
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P roof This is a direct conséquence of the fact that the vectors e \ , . . . , e n form a 
basis. Let us assume that there are two expressions 

X = CL\e\ H- Œ2&2 + * • • + d n e n , X = P\e\ + p2&2 + * * * + Pn e n- 

Subtracting one equality from the other, we obtain 

(ai - p\)e\ + (a 2 - P 2 )e 2 H h (a„ - P„)e„ = 0. 


But since the vectors eu , e n form a basis, then by définition, they are linearly 
independent. From this it follows that ol\ = , a 2 — fc* • ••,<x n = /3 n , as was to be 

proved. □ 

Corollary 3.27 If e \, ... ,e n is a basis of the vector space L, then L can be written 
in the form 

L= (e\) © {e 2 ) ©•••© (e n ). 

Définition 3.28 The numbers aq, . . . , ot n in the expression (3.8) are called the co- 
ordinates of the vector x with respect to the basis e\, ... ,e n (or coordinates in that 
basis). 

Example 3.29 An arbitrary vector e 7 ^ 0 on a line (that is, a one-dimensional vector 
space) forms a basis of the line. For an arbitrary vector x on the same line, we 
hâve the expression (3.8), which in the given case takes the form x — ae with some 
scalar a. This a is the coordinate (in this case the only one) of the vector x in the 
basis e. If e' 0 is another vector on the same line, then it provides another basis. 
We hâve seen that e' — ce for some scalar c^O (since e' 0). Therefore, from the 
relationship x — ae we obtain that x — ac~ l e' . Thus in the basis e' , the coordinate 
of the vector x is equal to ac~ { . 

Thus we hâve seen that the coordinates of a vector x dépend not only on the vec- 
tor itself, but 011 the basis that we use (in the general case, e \ , . . . , e n ). Consequently, 
the coordinates of a vector are not an “intrinsic géométrie” property. The situation 
here is similar to the measurement of physical quantifies: the length of a line seg- 
ment or the mass of a body. Neither the one nor the other can be characterized by a 
number. It is necessary as well to hâve a unit of measurement: in the first case, the 
meter, centime ter, etc.; in the second, the kilogram, gram, etc. We shall encounter 
such a phenomenon repeatedly: some object (such as, for example, a vector) cannot 
be defined “in and of itself” by some set or other of numbers; rather, something 
similar to a unit of measurement (in our case, a basis) must be chosen. Here, there 
are always two possible points of view: either to choose some method of associat- 
ing numbers with the object or to limit oneself to the study of its “purely intrinsic” 
properties, independent of the method of association. For example, in physics, we 
are interested in physical quantifies themselves, but the laws of nature are usually 
expressed in the form of mathematical relationships among the numbers that char- 
acterize them. We will try to reconcile both points of view after defining how the 
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numbers that characterize the object change under different methods of associating 
numbers with the object. In particular, in Sect. 3.4, we shall consider the question 
of how the coordinates of a vector change under a change of basis. 

In terms of the coordinates of vectors (relative to an arbitrary basis e \, . . . , e n ), 
it is easy to express the operations that enter into the définition of a vector space, 
namely the addition of vectors and the multiplication of a vector by a scalar. Namely, 
if x and y are two vectors, and 


x — a\e\-\ \-a n e n , y — P i^i H \~ Pn^n, 


then 


* + J = (ai*i H \rot n e n ) + (P\e\ H h p n e n ) 

— (a\ P\)e\ -\ \-(a n + p n )e n , (3.9) 

and for an arbitrary scalar a , 

ax =a(a \e\ H h oi n e n ) = (ctoi\)e\ H h (aot n )e n , (3.10) 

so that the coordinates of vectors under addition are added, and under multiplication 
by a scalar, they are multiplied by that scalar. 

It follows from the définition of a basis that if dim L — n and e \ , . . . , e n is any set 
of n linearly independent vectors in L, then they form a basis of L. Indeed, it suffices 
to verify that an arbitrary vector x e L can be written as a linear combination of 
these vectors. But from the définition of dimension, n + 1 vectors x, e \ , . . . , e n are 
linearly dépendent, that is, 

Px Œ\e\ + OL 2^2 H - • • • H - C/n^n — 0 

for some scalars p, oq, ao , . . . , ot n . In this case, P ^ 0, for otherwise, this would 
contradict the linear independence of the vectors forming the basis. But then 

x = ~P~ l a\e\ - p- l a 2 e 2 

which was to be proved. 

From the définition, it follows that if the dimension of a vector space L is equal 
to n , then there exist n linearly independent vectors in L, which by what we hâve 
proved, form a basis. Now we shall establish a more general fact. 

Theorem 3.30 If e \ , . . . , e m are linearly independent vectors in a vector space L of 
finite dimension n , then this set of vectors can be extended to a basis ofL , that is , 
there exist vectors ei, m < i < n, such that e \, . . . , e m , e m +\ , . . . , e n is a basis of L. 

P roof If the vectors e\, , e m already form a basis, then m — n, and the theo- 
rem is proved. If they do not form a basis, then clearly m < n, and there exists a 
vector in L that is not a linear combination of e\, ... , e m . Thus the vectors 
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e\, ... , e m +\ are linearly independent. Indeed, if they were linearly dépendent, we 
would hâve the relationship 


a\e\ H h a m e m + a m+ ie m+ i = 0, 


(3.11) 


in which not ail the a\, ... , a m +\ were equal to zéro. Now we must hâve a m +i ^ 0, 
since otherwise we would hâve to infer that the vectors e \ , . . . , e m were linearly dé- 
pendent. But then from (3.11) we obtain that e m +\ — P\e\ + • • • + p m e m , where 
Pi — that ^ S ’ vector e m + 1 is a linear combination of the vectors 

e\ , . . . , e m , contradicting our assumption. 

The same reasoning can be applied to the System of vectors e\ , . . . , e m +\ . Con- 
tinuing in this way, we will obtain a System containing an ever increasing number of 
linearly independent vectors, and sooner or later, we will hâve to stop the process, 
since the dimension of the space L is finite. But then every vector of the space L will 
be a linear combination of the linearly independent vectors of our enlarged System. 
That is, we will hâve produced a basis. □ 

In the situation under considération in Theorem 3.30, we shall say that the Sys- 
tem of vectors e \ , . . . , e m has been augmented to the basis e \ , . . . , e n . As an easy 
vérification shows, this is équivalent to the relationship 


(é?l, . . . , C n ) — (^1, •••, ) © {@m + 1 » • • • 5 &n) • 


(3.12) 


Coroliary 3.31 For an arbitrary subspace \d C L of the finit e-dimensional vector 
space L, there exists a subspace L" C L such that L = L / 0L // . 

P roof It suffices to take any basis e \ , . . . , e m of augment it to a basis e \ , . . . , e n 
of the space L, and set L = (e \ , . . . , e n ) , L' = (e \ , . . . , e m ) , and L " = (e m + \ , . . . , e n ) 
in (3.12). □ 

We shall now prove an assertion that is the central point of the entire theory. 
Therefore, we shall présent two proofs (although they are, in fact, based on the 
same principle). 

Lemma 3.32 More than n linear combinations of n vectors in an arbitrary vector 
space are of necessity linearly dépendent. 


P roof First proof. Let us write down explicitly just what has to be proved. Suppose 
we are given n vectors x\, ... ,x n and m linear combinations of them y j , . . . , y 
where m > n . Then we hâve the relationship s 


m ’ 


Tl —011*1 + 012*2 + * ' * + 01/7*77 > 

y 2 — 021*1 + 022*2 + • ' • + 02 / 7*77 » 


(3.13) 


y ni — 0/77 1*1 + 0 / 772*2 H ClmnXn 
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for certain scalars au . We now hâve to find scalars ... , a m , not ail of them equal 

to zéro, such that 

Ji + &2 y 2 d h a my m — 0. 

Substituting here (3.13) and collecting like terms, we obtain 


(oi[an + a 2 a 2 1 H \-a m a m i)X[ + (aia [2 + a 2 a 2 2 H h a m a m 2)^2 

+ • • • + (oc\a\ n + oc 2 a 2n + • • • + 

This equality will be satisfied if ail the coefficients of the vectors x\, ... ,x n are 
equal to zéro, that is, if the équations 

Cl\[Œ[ + #21 ^2 H - ' ‘ ' H - a m\^m — 0, 

a\ 2 0i[ + a 22 a 2 H h a m 2 a m = 0 , 


d - Cl 2 nOt 2 + • • * + Clmn^m — 

are satisfied. Since m > n by assumption, we hâve n homogeneous équations in 
more than n unknowns, namely ai, ... , a m . By Corollary 1.11, this System has a 
nontrivial solution a\ , . . . ,a m , which gives the assertion of the lemma. 

Second proof. This proof will be by induction on n and based on formula (3.13). 
The base case n — 1 of the induction is obvious: any m vectors proportional to the 
given vector x i will be linearly dépendent if m > 1 . 

Now let us consider the case of arbitrary n > 1. In formula (3.13), suppose that 
the coefficient an is not equal to 0. We may make this assumption with no loss 
of generality. Indeed, if in formula (3.13), ail coefficients satisfy aij — 0, then ail 
the vectors jq, . . . , y m are equal to 0, and the theorem is true (trivially). But if 
at least one coefficient aij is not equal to 0, then by changing the numération of 
the vectors x\, ... ,x n and jq, . . . , y m , we can move this coefficient to the upper 
left-hand corner and assume that a\\ ^ 0. Let us now subtract from the vectors 
y 2 , ... , y m the vector y ^ with a coefficient such that in the relationships (3.13), 
the vector x\ is eliminated. After this, we obtain the vectors y 2 — y 2 y \, . . . , y m — 
y m y i , where y 2 = a^a 2 \, . . . , y m — a~^a m \. These m — 1 vectors are already linear 
combinations of the n — 1 vectors x 2 , ... ,x n . Since we are using induction on n, we 
may assume the lemma to be true in this case. This means that there exist numbers 

a 2 , • • • , ot m , not ail zéro, such that a 2 (y 2 - y 2 y x ) 4 h a m (y m - y m y\) = 0, that 

is, 

— (K 2^2 + * ' ' + Ym°lm)y\ + Oi 2 y 2 + * * ' + <X m y m ~ 
which means that the vectors y j , . . . , y m are linearly dépendent. □ 

It was apparent that in the second proof, we used the method of Gaussian élimi- 
nation, which was used to prove Theorem 1.10, which served as a basis of the first 
proof. Thus both proofs are based on the same idea. 
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The connection between the notions of basis and dimension is made apparent in 
the following resuit. 

Theorem 3.33 If a vector space L has a basis of n vector s, then its dimension is n. 

P roof The proof of the theorem follows easily from the lemma. Let e \ , . . . , e n be a 
basis of the space L. We shall show that dim L = n. In this space, there are n linearly 
independent vectors, for instance, the vectors e \, . . . , e n themselves. And since an 
arbitrary vector of L is a linear combination of the vectors of a basis, then by the 
lemma, there cannot exist a greater number of linearly independent vectors. □ 

Corollary 3.34 Theorem 3.33 shows that every basis of a ( finite-dimensional ) vec- 
tor space consists ofthe same number of vectors equal to the dimension ofthe space. 
The refore to détermine the dimension of a vector space , it suffices to find any basis 
in that space. 


As a rule, this is a relatively easy task. For example, it is clear that in the space of 
polynomials (in the variable t) of degree at most n, there is a basis consisting of the 
polynomials 1, t, t 2 , . . . , t n . This implies that the dimension of the space is n + 1. 


Example 3.35 Consider the vector space W 1 of rows of length n consisting of élé- 
ments of an arbitrary field K. In this space, there is a basis consisting of the rows 


0 \ = ( 1 , 0 , 0 , ... , 0 ), 

£ 2 = (0, 1 , 0 , ... , 0), 


0/i = ( 0 , 0 , 0 , ... , 1 ). 


(3.14) 


In Sect. 1.1, we verified in the proof of Theorem 1.3 that every row of length n is a 
linear combination of these n rows. The same reasoning shows that these rows are 

linearly independent. Indeed, suppose that oi\e\-\ h ot n e n — 0. As we hâve seen, 

a[e\ H h a n 0n is equal to (oq, . . . , a n ). This means that a\ = • • • = a n — 0. Thus 

the dimension of the space K 1 ’ is n. 


Example 3.36 Let M be an arbitrary set. Let us dénoté by F (M) the collection of ail 
functions on M taking values in some field (the real numbers, complex numbers, or 
an arbitrary field K). The set F(M) becomes a vector space if for / j e F(M) and 
f 2 £ F(M ), we define the sum and multiplication by a scalar a using the formulas 

(fi +/2)W = /iW + /2W- (' af)(x) = af(x ) 

for arbitrary x e M. 

Suppose that the set M is finite. Let us dénoté by 8 X (y) the function that is equal 
to 1 for y — x and is 0 for ail y x. Functions <5* (y) are called delta functions . 
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We shall show that they constitute a basis of the set F (M). Indeed, for any function 
/ g F(M) we hâve the obvious equality 

/00=£/(*)4c00. (3-15) 

xeM 

from which it follows that an arbitrary function in the space F (M) can be expressed 
as a linear combination of the 8 X , x g M. It is clear that the set of ail delta functions 
is linearly independent, that is, they form a basis of the vector space F (M). Silice 
the number of functions in this collection is equal to the number of éléments of the 
set M, the set F(M) is finite-dimensional, and dim F(M) is equal to the number 
of éléments in M. In the case that M = N n (see the définition on p. xi), then any 
function / g F (N n ) is uniquely determined by its values /(l), which 
are its coordinates in the décomposition (3.15) with respect to the basis 8 X , x e M. 
If we set ü[ — /(/), then the numbers (a\, ... ,a n ) form a row, and this shows that 
the vector space F(N n ) coincides with the space K. n . In particular, the basis of the 
space F(N n ) consisting of the delta functions coincides with the basis (3.14) of the 
space K' 1 . 

In many cases, Theorem 3.33 provides a simple method for finding the dimension 
of a vector space. 

Theorem 3.37 The dimension of a vector spcice (x \ , . . . , x m ) is equal to the maxi- 
mal number of linearly independent vectors among the vectors x\ , . . . , x m . 

Therefore, even though the définition of dimension requires the considération of 
ail the vectors in the space (x\, . . . , x m ), Theorem 3.37 makes it possible to limit 
considération to only the vectors x i , . . . , x m . 

P roof of Theorem 3.37 Let us set L' = (jci , . . . , x m ) and define by / the maximum 
number of linearly independent vectors among x\, ... ,x m . Changing the numéra- 
tion if necessary, we may suppose that the first / vectors x\, ... ,xj are linearly in- 
dependent. Let L"= (jci, Itis clear that x i , . . . , xi form a basis of the space 

L", and by Theorem 3.33, dim L" = l. We shall prove that L" = L, which will give us 
the resuit of Theorem 3.37. If / = m, then this is obvious. Suppose, then, that / < m. 
Then by our assumption, for any k = l+l, . . . ,m 9 the vectors x i , . . . , jc/ , Xk are lin- 
early dépendent, that is, there is a linear combination a\x\ H h otixi + otkXk — 0 

in which not ail a/ are equal to zéro. And furthermore, it is necessary that ctk 7 ^ 0» 
since otherwise, we would obtain the linear dependence of the vectors x \ , . . . , xi, 
which contradicts the hypothesis. Then 

Xk = —apoi\x\ - apa 2 X 2 apaixi, 

that is, the vector Xk is in L" . We hâve shown this for ail k > /, but by construction, it 
is also true for k <1. This means that ail vectors Xk are in the space L", and hence so 
are ail linear combinations of them. Therefore, not only do we hâve L" C L' (which 
is obvious by construction), but L c L", which shows that L" = L, as desired. □ 
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Theorem 3.38 If Li and l _2 are two finite-dimensional vector spaces, then 

dim(Li ® L2) = dim Li + dim L2 . 

P roof Let dimLi = r, diml _2 = 5, let e \, . . . , e r be a basis of the space Li, and let 
/ f s be a basis of the space L 2 . We shall show that the collection of r + s vec- 
tors e \ , . . . , e r , and / 1 , . . . , f s forms a basis of the space l_i ® L 2 . By the définition 
of direct sum, every vector x e Li ® L 2 can be expressed in the form x = x\ + X 2 , 
where x ,• G L/ . But the vector x\ is a linear combination of the vectors e\, ... ,e r , 
while the vector X 2 is a linear combination of the vectors f , f s - As a resuit, 
we obtain a représentation of the vector x as a linear combination of the r + s vec- 
tors e \ , . . . , e r , / 1 , . . . , f s . The linear independence of these vectors is just as easily 
verified. Suppose there exists a relationship 


a \e\ H \-a r e r +Pifi H \~Psf s = 0 . 


We set x\ — ct\e\ + • • • + a r e r and X 2 = f\f \ + • • • + fîsf s - Then we hâve the 
equality x\ + X 2 — 0 with x/ e L/. From this, by the définition of the direct sum, 
it follows that x i = 0 and X 2 = 0. From the linear independence of the vectors 
e\, ... , e r , it follows that a\ = 0 , . . . , a r = 0 , and similarly, f>\ — 0 , . . . , fi s — 0 . □ 

Corollary 3.39 For finite-dimensional spaces l_i, l_ 2 , . . . , L k for arbitrary k > 2, we 
hâve 


dim(Li ® l _2 ® • • • © L*) = dimLi + dimL 2 H h dimL^. 

P roof The assertion follows readily from Theorem 3.38 by induction on k. □ 

Corollary 3.40 If Li , . . . , L r and L are vector spaces such that L = l_i + • • • + L r , 
and if dim L = dim Li H + dim L r , then L = l_i © • • • 0 L r . 

Proof We select a basis in each of the L/ and combine them into a System of vec- 
tors ei, ... , e n . By assumption, the number n of vectors in this System is equal to 
dimL, and L = (e\, . . . , e n ). By Theorem 3.37, the vectors e\, . . . , e n are linearly 
independent, and this implies that L = Li © • • • © L,-. □ 

These considérations make it possible to give a more visual, géométrie, char- 
acterization of the notion of linear dependence. Namely, let us prove that vectors 
x i , . . . , x m are linearly dépendent if and only if they are contained in a subspace L' 
of dimension less than m. 

Indeed, let us dénoté by / the largest number of linearly independent vectors 
among x\, ... , x m . Let us assume that these independent vectors are x \ , . . . , x/ and 
set L' = (x i , . . . , x/) . Then for / = m, the vectors x\, ... ,x m are linearly indepen- 
dent, and our assertion follows from the définition of dimension. If / < m, then ail 
the vectors xi, . . . , x m are contained in the subspace L', whose dimension, by The- 
orem 3.33, is /, and the assertion is correct. 
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Using the concepts introduced thus far, it is possible to prove a useful general- 
ization of Theorem 3.38. 

Theorem 3.41 For any two finite-dimensional vector spaces Li and l_ 2 , one has the 
equality 

dim(Li + L 2 ) = dimLi + diml _2 — dim(Li Pi L 2 ). (3.16) 

Theorem 3.38 is obtained as a simple corollary of Theorem 3.41. Indeed, if Lj + 
|_2 = Li ® L 2 , then by Corollary 3.15, the intersection L\ D L 2 is equal to (0), and it 
remains only to use the fact that dim( 0 ) = 0 . 

Proof of Theorem 3.41 Let us set Lo = L_i fl L 2 . From Corollary 3.31, it follows that 
there exist subspaces L' { C U and L' 2 C L 2 such that 


Li = Lo0Li, L -2 = Lo © L- 2 - (3.17) 

Formula (3.16) follows easily from the equality Li + L 2 = Lo © Lj © L 7 . Indeed, 
since Lo = Lj D L 2 , then in view of relationship (3.17) and Theorem 3.38, we obtain 
Li + L 2 = Li © L^, and therefore, 

dim(Li + L 2 ) = dim Li + dim L 2 = dim Li + dim L 2 — dim Lo, 

which yields relationship (3.16). 

Let us prove that Li + L2 = Lo © Lj © L^. It is clear that each subspace Lo, Lj , \f 2 
is contained in Li + L2, so that their sum Lo + Lj + L' 2 is also contained in Li + L2. 
But an arbitrary vector z G Li -b L2 can be represented in the form z = x + y, where 
x g Li, y g L2, and in view of relationship (3.17), we hâve the représentations x — 
u + v and y — u f + w, where h, u' g Lq, v g Lj, w g L^, from which we obtain 
z = x + y = (u + u') + v + w, and this means that the vector z is contained in 
Lo + Lj + L^. From this, it follows that 

Li © L 2 = L 0 + Lj + L ' 2 = Li + L'. 

But Lj fl L 7 = (0), since the vector x g Li fl L' 2 is contained both in L[ D L 2 = Lo and 
in L^, while in view of (3.17), the intersection Lo fl L' 2 is equal to (0). As a resuit, 
we obtain the required equality 


Li + L 2 = (L 0 ® Lj) + L' = (L 0 ® Lj) ® L' 2 = L 0 ® L'j ® L' 2 , 
which, as we hâve seen, proves Theorem 3.41. □ 

Corollary 3.42 Let Li and L 2 be subspaces of a finite-dimensional vector space L. 
Then from the inequality dimLi + dimL 2 > dimL, it follows that Li Pi L 2 7 ^ (0), that 
is , the subspaces Li and L 2 hâve a nonzero vector in common. 
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Indeed, in this case, l_i + \-2 C L, which means that dim(Li + L 2 ) < dim L. Taking 
this into account, we obtain from (3.16) that 


dim(Li fl L 2 ) = dim l_i + dim L 2 — dim(Li + L 2 ) > dim l_i + dim L 2 — dim L > 0, 


from which it follows that Li Pi L 2 7 ^ ( 0 ). 

For example, two planes passing through the origin in three-dimensional space 
hâve a straight line in common. 

We shall now obtain an expression for the dimension of a subspace (a 1 , . . . , a m ) 
using the theory of déterminants. Let a\, ... ,a m be vectors in the space L, and let 
e \ , . . . , e n be some basis of L. We shall write the coordinates of the vector a, in this 
basis as the i th row of a matrix A : 



a 11 

012 

01 

^21 

022 

02 n 

\ 0 m 1 

0 m 2 

t^mn J 


Theorem 3.43 The dimension ofthe vector space (a\, ... , a m ) is equal to the rank 
of the matrix A. 

P roof The linear dependence of the vectors a \ , . . . , for k < m is équivalent to 
the linear dependence of the rows of the matrix A consisting of the same numbers. 
In Theorem 2.41 we proved that if the rank of a matrix is equal to r, then ail of 
its rows are linear combinations of some collection of r of its rows. From this it 
follows already that dim(a 1 , . . . , a m ) < r. But in fact, from the proof of the same 
Theorem 2.41, it follows that for such a collection of r rows, one may take any r 
rows of the matrix in which there is a nonzero minor of order r (see the remark 
following Theorem 2.41). Let us show that such a collection of r rows is linearly 
independent, from which we will already hâve a proof of Theorem 3.43. We may 
assume that a nonzero minor M r is located in the first r columns and first r rows 
of the matrix A. We then hâve to establish the linear independence of the vectors 

a \ , . . . , a r . If we assume that a\a\ + h a r a r = 0, then if we focus attention on 

only the first r coordinates of the vectors, we obtain r homogeneous linear équations 
in the unknown coefficients oq , . . . , ct r . It is easy to see that the déterminant of the 
matrix of this System is equal to M r 0, and as a conséquence, it has a unique solu- 
tion, which is the zéro solution: a 1 =0, . . . , a r = 0. That is, the vectors a \, . . . , a r 
are indeed linearly independent. □ 

In the past, Theorem 3.43 was formulated in the following form, which is also 
sometimes useful. Consider the vector space W 1 of rows of length n (where K is the 
field of real numbers, the field of complex numbers, or an arbitrary field). Then the 
vectors will be rows of length n (in our case, the rows of the matrix A). From the 
proof of Theorem 3.43 we hâve at once the following corollary. 
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Fig. 3.4 Hyperplanes in a 
vector space 




Corollary 3.44 The rank of a matrix A is equal to the largest number of linearly 
independent rows of A. 

From this, we obtain the following unexpected resuit. 

Corollary 3.45 The rank of a matrix A is also equal to the largest number of lin- 
early independent columns of A . 

This follows at once from the définition of the rank of a matrix and Theorem 2.32. 

To conclude this section, let us examine in greater detail the case of real vector 
spaces, and to this end, introduce some important notions that will be used in the 
sequel. 

Let l! be a hyperplane in the finite-dimensional vector space L, that is, dim L' = 
dimL — 1. Then this hyperplane divides L into two parts, as shown in Fig. 3.4 for 
the case of a line and a plane. 

Indeed, since L 7^ L, there exists a vector e e L, e £ L'. From this, it follows that 
L = L' ® (e). For according to the choice of e , the intersection L' fl (e) is equal to 
(0), and by Theorem 3.38, we hâve the equality 

dim(L ® (e)) = dimL -b 1 = dimL, 

from which we obtain, with the help of Theorem 3.24, that L' ® (e) — L. Thus an 
arbitrary vector x e L can be uniquely expressed in the form 

x = cte + u, u g L.' , (3.18) 

where a is some scalar. Since the scalars in our case are real, it makes sense to talk 
about their sign. The collection of vectors x expressed as in (3.18) for which a > 0 
is denoted by L + . Likewise, the set of vectors x of the form (3.18) for which a < 0 
is denoted by L" . The sets L + and L~ are called half-spaces of the space L. Clearly, 
L\L' = L + UL“. 

Of course, our construction dépends not only on the hyperplane L', but also on the 
choice of the vector e £ L'. It is important to note that with a change in the vector e , 
the half-spaces L + and L~ might change, but the pair (L + , L~) will remain as before; 
that is, either the spaces do not change at ail, or else they exchange places. Indeed, 
let e' ^ L' be some other vector. Then it can be represented in the form e' = Xe + v, 
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where the number X is nonzero and v is in L'. This means that e — X 1 (V — v). Then 
for an arbitrary vector x from (3.18), we obtain, as in (3.18), the représentation 

x = aX~ l (e' — v) + u = aX~ [ e' + u ' , u' e l! , 

where u' — u — aX~ [ v, and we see that in passing from e to e ' , the scalar a in the 
décomposition (3.18) is multiplied by À -1 . Hence the half-spaces L + and L~ do not 
change if À > 0, and they exchange places if X < 0. 

The above définition of décomposition of a real vector space L by a hyperplane 
L' has a natural interprétation in topological terms (see pp. xvii-xix). Readers not 
interested in this aspect of these ideas can skip the following five paragraphs. 

If we wish to use topological terminology, then we are going to hâve to introduce 
on L the notion of convergence of a sequence of vectors. We shall do this using the 
notion of a metric (see p. xviii). Let us choose in L an arbitrary basis e \ , . . . , e n , and 

for vectors x = a\e\ 4 \-ci n e n and y = p\e\ H h Pn e n , we define the number 

r(x, y ) by means of formula 

r(x,y)= \ai-Pi\-\ h \a n - fini- 

It easily follows from the properties of absolute value that ail three conditions 
in the définition of a metric space are satisfied. Thus the vector space L and ail 
of its subspaces are metric spaces with the metric r(x,y), and for a sequence 
of vectors there is automatically defined the notion of convergence: Xk —> x as 
k -> oo if r(xk,x) 0 as k -> oo. In other words, if x = ct\e\ + • • • + a n e n and 
Xk — a[ke\ + • • • + oinkCn , then the convergence Xk —> x is équivalent to the con- 
vergence of the n coordinate sequences: ctik a/ for ail i — 1, ... , n. We observe 
that in the définition of r(x, y), we hâve used the coordinates of the vectors x and y 
in some basis, and consequently, the metric obtained dépends on the choice of ba- 
sis. Nevertheless, the notion of convergence does not dépend on the choice of basis 
e \, . . . , e n . This follows easily from the formulas (3.35) relating the coordinates of 
a vector in various bases, which will be presented later. 

The meaning of a partition L \ L' = L + U L~ consists in the fact that the metric 
space L \ L' is not path-connected, while L + and L~ are its path-connected compo- 
nents. 

Indeed, let us suppose that in the metric space L \ L', there exists a deformation 
of the vector x to y, that is, a continuous mapping / : [0, 1] — ► L \ L' such that 
/(0) = x and /(l) = y. Then by formula (3.18), we hâve the représentation 

x = ae + u , y = fie + v, f(t) = y(t)e + w(t), (3.19) 

where u, v e L' and w(t) e L ' for ail t g [0, 1], and y(t) is a function taking real 
values, continuous in the interval [0, 1], and moreover, y(0) = a and y( 1) = /3. 

If x e L + and y e L - , then a > 0 and P < 0, and by properties of continuous 
functions known from calculus, y(r) = 0 for some 0 < r < 1. But then the vector 
/(r) = u;(r)is contained within the hyperplane L', and it follows that vectors x and 
y cannot be deformed into each other in the set L \ L'. Therefore, the metric space 
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Fig. 3.5 Bases assigning one 
and the same flag 



L \ L' is not path-connected. But if x, y e L + or x, y e L~, then in the représenta- 
tions (3.19) for these vectors, the numbers a and /3 hâve the same sign. Then, as is 
easily seen, the mapping f(t) = ( 1 — t)x + ty, t e [0, 1], détermines a continuous 
deformation of x to y in the set L + or L - , respectively. 

From these considérations, it is easy to obtain a proof of the previous assertion 
without using any formulas. 

If we distinguish one of the two half-spaces L + and L - (we shall dénoté the 
half-space thus distinguished by L + ), then the pair (L, L') is said to be directed. 
For example, in the case of a line (Fig. 3.4(a)), this corresponds to a choice of the 
direction of the line L. 

Using these concepts, we can obtain a more visual idea of the notion of basis (in 
the case of a real vector space). 

Définition 3.46 A flag in a finite-dimensional vector space L is a sequence of sub- 
spaces 

(0) C Li C L 2 C • • • C L„ = L (3.20) 

such that 

(a) dim L/ — i for ail i = 1 , . . . , n . 

(b) Each pair (L/, L/_i) is directed. 

It is clear that in view of condition (a), the subspace L/_i is a hyperplane in L /, 
and therefore the above définition of directedness is applicable. 

Every basis e\, ... ,e n of a space L détermines a particular flag. Namely, we set 
L i = {e\ and to apply directedness to the pair (L/, L z _i), we select in the 

collection of half-spaces the one determined by the vector e z (clearly, e { ^ L/_i). 

However, we must observe that different bases of the space L can détermine one 
and the same flag. For example, in Fig. 3.5, the bases (e \ , £ 2 ) and (e\ , e' 2 ) détermine 
the same flag in the plane. But later, in Sect. 7.2, we shall meet a situation in which 
there is defined a bijection between the bases of a vector space and its flags (this is 
accomplished through the sélection of some spécial bases). 


3.3 Linear Transformations of Vector Spaces 

Here we shall présent a very broad generalization of the notion of linear function, 
with which our course began. The generalization occurs in two aspects. First, in 
Sect. 1.1, a linear function was defined as a function of rows of length n. Here, we 
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shall replace the rows of given length with vectors of an arbitrary vector space L. 
Second, the value of the linear function in Sect. 1.1 was considered a number, that 
is, in other words, an element of the space M 1 or C 1 or K 1 for an arbitrary field K. 
We shall now replace the numbers with vectors in an arbitrary vector space M. Thus 
our définition will include two vector spaces L and M. The reader may consider both 
spaces real, complex, or defined over an arbitrary field K, but it must be the same 
field for both L and M. In this case, we shall speak about the éléments of the field 
using the same conventions that we established in Sect. 3.1 for scalars (see p. 82). 

Let us recall that a linear function is defined by properties (1.8) and (1.9), pre- 
sented in Theorem 1.3 on page 3. The following définition is analogous to this. 


Définition 3.47 A linear transformation of a vector space L to another vector space 
M is a mapping A : L M that assigns to each vector x g L some vector <A(x) g M 
and exhibits the following properties: 


A(x + y) = A(x) + A(y), 
A (a x) = aA(x) 


(3.21) 


for every scalar a and ail vectors x and y in the space L. 


A linear transformation is also called an operator or (only in the case that M = L) 
an endomorphism. 

Let us note one obvious but useful property that follows directly from the défini- 
tions. 


Proposition 3.48 Under any linear transformation , the image of the null vector is 
the null vector. More precisely , since we may be dealing with two different vector 
spaces , we might reformulate the statement in the following form: if A : L — ► M is a 
linear transformation , and 0 G L and 0 / G M are the null vectors in the vector spaces 
L and M, then cA(O) = 0 / . 

Proof By the définition of a vector space, for an arbitrary vector x g L, there exists 
an additive inverse — x G L, that is, a vector such that x + (— x) = 0, and moreover 
(see p. 82), the vector — x is obtained by multiplying x by the number —1. Applying 
the linear transformation A to both sides of the equality 0 = x + (— x), then in view 
of properties (3.21), we obtain eA(O) = ^(x) — eA(x) = 0', since for the vector *>4>(x) 
of the space M, the vector — eA(x) is its additive inverse, and their sum is 0'. □ 

Example 3.49 For an arbitrary vector space L, the identity mapping defines a linear 
transformation é?(x) = x, for every x g L, from the space L to itself. 

Example 3.50 A rotation of the plane M 2 through some angle about the origin is 
a linear transformation (here L = M = M 2 ). The conditions of (3.21) are clearly 
satisfied here. 


3.3 Linear Transformations of Vector Spaces 


103 


Example 3.51 If L is the space of continuously différentiable functions on an in- 
terval [a, b], and M is the space of continuons functions on the same interval, and 
if for x = fit), we define A(x) = then the mapping A : L — ► M is a linear 
transformation. 

Example 3.52 If L is the space of twice continuously différentiable functions on 
an interval [a, b], M is the same space as in the previous example, q(t) is some 
continuous function on the interval [a, b], and for x = fit) we set eA(jtcr) = f"(t) + 
q(t) f(t), then the mapping A : L — ► M is a linear transformation. In analysis, it is 
known as the Sturm-Liouville operator. 

Example 3.53 Let L be the space of ail polynomials, and for x = fit), as in Exam- 
ple 3.51, we set A(x) = fit). Clearly, A : L — > L is a linear transformation (that is, 
here we hâve M = L). But if L is the space of polynomials of degree at most n, and 
M is the space of polynomials of degree at most n — 1 , then the same formula gives 
a linear transformation A : L -> M. 

Example 3.54 Suppose we are given the représentation of a space L as a direct 
sum of two subspaces: L = L' ® L". This means that every vector x g L can be 
uniquely represented in the form x — x' + x", where x' e E and x" e L" . Assigning 
to each vector x g L the term x' g L' in this représentation gives a mapping P : L — 
E, P(x) = x' . A simple vérification shows that P is a linear transformation. It is 
called the projection onto the subspace E parallel to L". In this case, for the vector 
x g L, its image P(x) G E is called the projection vector of x onto E parallel to L" . 
Analogously, for any subset XcL, its image P(X) C E is called the projection of 
X onto E parallel to L". 

Example 3.55 Let L = M and dimL = dimM = 1. Then L = M = (e), where e is 
some nonnull vector and Aie) — ae, where a is a scalar. From the définition of 
a linear transformation, it follows directly that <A(x) — a x for every vector x g L. 
Consequently, such is the general form of ail linear transformations A : L -> L in 
the case dim L = 1 . 

In the sequel, we shall consider the case that the dimensions of the spaces L and 
M are finite. This means that in L, there exists some basis e \ , . . . , e n , and in M, there 
is a basis f y, ... , f m . Then every vector x g L can be written in the form 


x = ci\e\ + a 2 e 2 -\ t -(x„e„ 


Using the relationship (3.21) several times, we shall obtain that for any linear trans- 
formation A : L —> M, the image of the vector x is equal to 


Aix) = a\ Aie i) + a 2 A(e 2 ) H h a n A(e n ). 


(3.22) 
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The vectors cA(^i), . . . , A(e n ) belong to the space M, and by the définition of a 
basis, they are linear combinations of the vectors /},..., / m , that is, 

A(e\) = a\\f \ + < 221/2 + b a m \f m > 

^>(^2) — <312/ 1 + <222/2 ~b b a m 2 f m , /0 


^ (^/7 ) — < 2 1, 7 / 1 + < 22/7 / 2 H b <2/7777 f m . 

On the other hand, the image eA(x) of the vector x belonging to the space M has in 
the basis f 1 , . . . , f m certain coordinates Pi , . . . , fi m , that is, it can be written in the 
form 

Mx) = Pi fi + hfi + * • • + A*/*,, (3.24) 

and moreover, such a représentation is unique. 

Substituting in (3.22) the expression (3.23) for A (et) and grouping terms as nec- 
essary, we obtain a représentation of <A(x) in the form 


^(x) = 1 + < 221/2 + * * * “b <2ml/ m ) b 

+ (<2 1/î / 1 + <22/7 /2 “b ‘ ‘ ‘ + <2 m /7 f m ) 

= (Of 1 <2 1 1 + û' 2<212 H + Oi n a\ n )f 1 + • • • 

+ («1 < 2 /? 7 l + Ol2&m2 + * * ‘ + Oi n Clmn) f m - 

Because of the uniqueness of the décomposition (3.24), we thus obtain expres- 
sions for the coordinates . . . , p m of the vector eA(x) in terms of the coordinates 
ai , . . . , a n of the vector x : 


Pi — a nai -\-a12a2 H Va\ n a n , 

p 2 =<221^1 +tf22<*2 3 b <22/7^77, 


(3.25) 


p m — <2/77 1 1 “b <2/772<^2 “b * * * “b <2,77/7 <^77- 


Formula (3.25) gives us an explicit expression for the action of the linear transfor- 
mation A for the chosen coordinates (that is, bases) of the spaces L and M. This 
expression represents by itself the linear substitution of variables with the matrix 


/ an 

<212 

<2 1/7 ^ 

<221 

<222 

<22/7 

\<^/77 1 

<2/77 2 

<2/77/7 / 


(3.26) 


consisting of the coefficients that enter into the formula (3.25). The matrix A is of 
type ( m,n ) and is the transpose of the matrix consisting of the coefficients of the 
linear combinations in formula (3.23). 
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Définition 3.56 The matrix A in (3.26) is called the matrix of the linear transfor- 
mation A : L —> M given by formula (3.23) in the bases e \ , . . . , e n and f y , . . . , f m . 

In other words, the matrix A of the linear transformation A> is a matrix whose 
i th column consists of the coordinates of the vector A(e /) in the basis f , f m . 
We would like to emphasize that the coordinates are written in the columns, and not 
in the rows (which, of course, also would hâve been possible), which has a number 
of advantages. It is clear that the matrix of the linear transformation dépends on 
both bases e\, ... ,e n and /j, . . . , f m . The situation here is the same as with the 
coordinates of a vector. A linear transformation has no matrix “in and of itself”: in 
order to associate a matrix with the transformation, it is necessary to choose bases 
in the spaces L and M. 

Using matrix multiplication, as defined in Sect. 2.9, one can write formula (3.25) 
in a more compact form. To do so, we introduce the following notation: Let a be a 
row vector (a matrix of type (1, «)), with coordinates a\, ... , a n , and let fi be a row 
vector with coordinates /3\, . . . , ft n . Similarly, let [a] be a column vector (a matrix 
of type (n, 1)), consisting of the same coordinates <x\, ...,a n , only now written 
vertically, and let [/?] be a column vector consisting of f3 \ , . . . , f> n , that is, 



It is clear that a and [a] are interchanged under the transpose operation, that is, 
a* = [a], and similarly, /?* = [fi]. Recalling the définition of matrix multiplication, 
we see that formula (3.25) has the form 

[fi] = A[a] or /?=<xA*. (3.27) 

The formulas that we hâve obtained show that with the chosen bases, a linear 
transformation is uniquely determined by its matrix. Conversely, having chosen 
bases for the vector spaces L and M in some way, then if we define the mapping 
«>4> : L — ► M with the help of relationships (3.22) and (3.23) with arbitrary matrix 
A = ( aij ), it is easy to verify that A will be a linear transformation. Therefore, there 
exists a bijection between the set £(L, M) of linear transformations L into M and the 
set of matrices of type (n, m). It is the choice of bases in the spaces L and M that 
détermines this correspondence. In the following section, we shall explain precisely 
how the matrix of a linear transformation dépends on the choice of bases. 

We shall dénoté the space of ail linear transformations of the space L into M by 
£(L, M). This set can itself be viewed as a vector space if for the mappings A and 
33 in £(L, M) we define the vector sum and the product of a vector and a scalar a by 
the following formulas: 


(A + 3B)(x) = es4>(.r) + 33(x), 
(ae>4>)(jr) = aA(x). 


(3.28) 
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It is easily checked that A + and a A are again linear transformations of L into M, 
that is, each of them satisfies conditions (3.21), while the operations that we hâve 
defined satisfy conditions (a)-(h) of a vector space. The null vector of the space 
£(L, M) is the linear transformation O : L —> M, defined by the formula (9(x ) = 0 
for ail x g L (in the last equality, 0 dénotés, of course, the null vector of the space 
M). It is called the null transformation. 

For some bases, suppose the matrix A of type (3.26) corresponds to the transfor- 
mation A : L -> M, while the matrix B of the same type corresponds to the transfor- 
mation £ : L -> M. We now explain how these matrices correspond to the transfor- 
mations A + 33 and a A defined by the conditions (3.28). By (3.23), we hâve 


(<A + &)€i —aufi + a2i f 2 H \~ a mi f m + b\i fl + ^2/ f 2 “b * * * + b mi f m 

— (ci U + bu)fi + (ci2i + t>2i)f 2 + • • • + (cimi + b m i) f m , 


and consequently the matrix A + B corresponds to the transformation A + £ . It can 
be checked even more simply that the transformation a A corresponds to the matrix 
a A. We thus see again that the set of linear transformations £(L, M), or the set of 
matrices of type (m,n), is converted into a vector space. 

In conclusion, let us consider the composition of mappings that are linear trans- 
formations. 

Let L, M, N be vector spaces, and let A : L M and 33 : M N be linear trans- 
formations. We observe that this is a spécial case of mappings between arbitrary 
sets, and by the general définition (see p. xiv), the composition of mappings S and 
A is the mapping 33 A : L ^ N given by the formula 

(Æ<A)(*) = ,S(«A(*)) (3.29) 

for ail jc g L. A simple vérification shows that 33 A is a linear transformation: it is 
necessary only to verify by substitution into (3.29) that ail the relationships (3.21) 
are satisfied by 33 A if they are satisfied for A and 33. In particular, in the case 
L = M = N we obtain that the composition of linear transformations from L to L is 
again a linear transformation from L to L. 

Let us assume now that in the vector spaces L, M, and N we hâve chosen bases 
e\, . . . , e n , fi, ... , f m , and , . . . , g t . We shall dénoté the matrix of the linear 
transformation A in the bases e\, ... ,e n and f i, ... , f m by A, and the matrix of the 
linear transformation 33 in the bases f i, ... , f m and g g t by B, and we seek 
the matrix of the linear transformation S A in the bases e\, ... ,e n and g i , . . . , g/. 
To this end, we must substitute the formulas of (3.23) for the transformation A into 
analogous formulas for the transformation 33 : 


<£(/i) = £ii£i +b 2 [g 2 -\ \~bng h 

«S(/2) = b\ig i + b 2 2g 2 h 1- bi2g[, 


(3.30) 


&(f m) — b\ m g\ + b2mg2 + * * * + b] m g]. 
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Formulas (3.23) and (3.30) represent two linear replacements in which the vec- 
tors play the rôle of the variables, whereas in other respects, they are no different 
from linear replacements of variables as examined by us earlier (see p. 62). Conse- 
quently, the resuit of sequentially applying these replacements will be the same as 
in Sect. 2.9, namely linear replacement with the matrix B A; that is, we obtain the 
relationship 

/ 

(£A)(ej) = T ^Cjjgj, / = l,...,n, 

7 = 1 

where the matrix C = (c/y) of the transformation 33 A is B A. We hâve thus estab- 
lished that the composition of linear transformations corresponds to the multiplica- 
tion oftheir matrices , taken in the same order. 

We observe that we hâve thus obtained a shorter and more natural proof of the 
associativity of matrix multiplication (formula (2.52)) in Sect. 2.9. Indeed, the asso- 
ciativity of the composition of arbitrary mappings of sets is well known (p. xiv), and 
in view of the established connection between linear transformations and their ma- 
trices (in whatever selected bases), we obtain the associativity of the matrix product. 

The operations of addition and composition of linear transformations are con- 
nected by the relationships 

eA((^S H- C) = tAo B + AC, (eA H- 33)C — AC AC, 

called the distributive property. To prove this, one may either use the définitions of 
addition and composition defined above together with the well-known property of 
the distributivity of the real and complex numbers (or the éléments of any set K, 
since it dérivés from the properties of a field) or dérivé the distributivity of linear 
transformations from what was proved in Sect. 2.9 regarding distributivity of ad- 
dition and multiplication of matrices (formula (2.53)), again using the connection 
established above between a linear transformation and its matrix. 


3.4 Change of Coordinates 

We hâve seen that the coordinates of a vector relative to a basis dépend on which 
basis in the vector space we hâve chosen. We hâve seen as well that the matrix of a 
linear transformation of vector spaces dépends on the choice of bases in both vector 
spaces. We shall now establish an explicit form of this dependence both for vectors 
and for transformations. 

Let e\, ... , e n be a certain basis of the vector space L. By Corollary 3.34, a basis 
of the given vector space consists of a fixed number of vectors, equal to dimL. 
Let e\, . . . , e' n be another basis of L. By définition, every vector x e L is a linear 
combination of the vectors e\, ... ,e n , that is, it can be expressed in the form 


x — a\e\ + Œ2&2 H I -0L n e n 


(3.31) 


108 


3 Vector Spaces 


with coefficients a?/ , which are the coordinates of x in the basis e i , . . . , e n . Similarly, 
we hâve the représentation 

x = a\e\ + a' 2 e' 2 H h a' n e' n (3.32) 

with coordinates a • of the vector x in the basis e\ , . . . , e' n . 

Furthermore, each of these vectors ... , e' n is itself a linear combination of the 
vectors e \ , . . . , e n , that is, 


e\ — c\ \e\ + C 2 i ^2 H V c n \e n , 

e' 2 — c\ 2 e\ + C 22&2 H h c n 2 e n , 


(3.33) 


e' n — c\ n e\ + H + c nn e n 

with some scalars C[j. And similarly, each of the vectors e\, ... ,e n is a linear com- 
bination of e\ , . . . , e' n , that is, 

e i =c' n e\ +c 2l e 2 H 

^2 = C 12^1 + c 22 e 2 ^ C «2^«’ /q 04^ 


.*» = c W+4.«2 + --’ + 4i*n 

for some scalars cF . 

Clearly, the collections of coefficients c// and c- ( - in formulas (3.33) and (3.34) 
provide the exact same information about the “mutual relationship” between the 
bases e\ , . . . , e n and e\ , . . . , e' n in the space L, and therefore it suffices for us to know 
only 011 e (either one will do) of these collections. More detailed information about 
the relationship between the coefficients c// and c'.. will be given below, but first, 

j ij 

we shall deduce a formula that describes the relationship between the coordinates of 
the vector x in the bases e \ , . . . , e n and e\, ... ,e' n . To this end, we shall substitute 
the expressions (3.33) for the vectors e' i into (3.32). Grouping the requisite terms, 
we obtain an expression for the vector x as a linear combination of e \ , . . . , e n : 

X — a[ (c n^i -h C2l^2 H 1" c n\ e n) H 1" a ' n (c\ n e[ + <22^2 + ’ ’ ’ + c nn e n ) 

— + &' 2 c\2 + • " + u' n c \n)e\ + • • • + (ot\c n i + oi 2 c n 2 H h a' n c nn )e n . 

Since e \ , . . . , e n is a basis of the vector space L and the coordinates of the vector x 
in this space are a/ (formula (3.31)), we obtain 


ai = cnaj + c i2 a 2 H h c\ n a’ n , 

a 2 = c 2 ia[ + c 22 a' 2 H h c 2n a’ n , 


(3.35) 


a n = Cnia'j + c n2 a 2 H h c nn a' n . 
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Relationships (3.35) are called change -of-coordinate formulas for a vector. Such 
a formula represents a linear change of variables, with the help of the matrix C 
consisting of the coefficients c/y , but in an order different from that in (3.33). In 
particular, C is the transpose of the matrix of coefficients (3.33). The matrix C is 
called the transition matrix from the basis e\ , . . . , e' n to the basis e\, . . . , e n , since 
with its help, the coordinates of a vector in the basis e \, . . . , e n are expressed in 
terms of its coordinates in the basis e \ , . . . , e' n . 

Using the product rule for matrices, the formula for the change of coordinates 
can be written in a more compact form. To this end, we shall use notation from the 
preceding section: a is a row vector consisting of the coordinates a \ , . . . , a n , and 
[a] is a column vector consisting of the very same coordinates. Keeping in mind the 
définition of matrix multiplication (Sect. 2.9), we see that formula (3.35) takes the 
form 


[oc] = C"[oc / ] or a = a'C*. 


(3.36) 


Remark 3.57 It is not difficult to see that the formulas for changing coordinates are 
quite similar to the formulas for a linear transformation. More precisely, relation- 
ships (3.35) and (3.36) are spécial cases of (3.25) and (3.27) for m = n, for exam- 
ple, if the vector space M coincides with L. This allows an interprétation of changing 
coordinates (that is, changing bases) of a vector space L as a linear transformation 
L. 

Similarly, if we substitute expressions (3.34) for vectors e / into (3.31), we obtain 
the relationship 


û'j = c' n a\ + c\ 2 Œ 2 H F c\ 

Oi' 2 — c’ lx Oi\ + ^22^2 + ^ C ' n ' 


1 

2 n a "’ 


(3.37) 


a 


n - c 'n\ a l + C «2 a 2 H ^ c 'r 


nn a n ’ 


similar to (3.35). Formula (3.37) is also called the substitution formula for coordi- 
nates of a vector. It represents the linear substitution of variables with the matrix C ' , 

which is the transpose of the matrix consisting of the coefficients c • ■ from (3.34). 

«/ 

The matrix C' is called the transition matrix from the basis e\, , e n to the basis 
e ' { , . . . , e' n . In matrix form, formula (3.37) takes the form 


[ce 7 ] = C'[u ] or a' — aC 


/* 


(3.38) 


Using formulas (3.36) and (3.38), one easily establishes the connection between C 
and C' . 


Lemma 3.58 The transition matrices C and C ' between any two bases of a vector 
space are nonsingular and are the inverses ofeach other. That is , C' — C~ [ . 


Proof Substituting the expression [a'\ — C'[a] into [a\ = C[ct'], taking into ac- 
count the associativity of matrix multiplication, we obtain the equality [a] = 
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(CC')[ot\. This equality holds for ail column vectors [ a\ of a given length n, and 
therefore, the matrix CC' on the right-hand side is the identity matrix. Indeed, 
rewriting this equality in the équivalent form ( CC' — £)[al = 0, it becomes clear 
that if the matrix CC' — E contains at least one nonzero element, then there ex- 
ists a column vector [a] for which ( CC ' — £’)[«] ^ 0. Thus we conclude that 
CC' — E , from which by définition of the inverse matrix (see Sect. 2.10), it fol- 
lows that C' — C _1 . □ 

We shall now explain how the matrix of a linear transformation dépends on the 
choice of bases. Suppose that in the bases e\, . . . , e n and f y , . . . , f m of the vector 
spaces L and M the transformation A : L — ► M has matrix A, the coordinates of the 
vector x are denoted by a,-, and the coordinates of the vector A(x) are denoted by 
pj . Similarly, in the bases e' v . . . , e' n and f\, . . . , f' m of these vector spaces, the 
same transformation A : L -> M has matrix A', the coordinates of the vector x are 
denoted by a-, and the coordinates of the vector A(x) are denoted by fi'j. 

Let C be the transition matrix from the basis e' v , e' n to the basis e \ , . . . , e n , 
which is a nonsingular matrix of order n , while D is the transition matrix from the 
basis / j , . . . , f' m to the basis / y , . . . , f m , which is a nonsingular matrix of order 
m (here n and m are the dimensions of the vector spaces L and M). Then by the 
change-of-coordinates formula (3.38), we obtain 

[a'] = C -1 [a], [P'] = D~ X [fi], 

and formula (3.27) of the linear transformation gives us the equalities 

[jB] = A[a], = A'[a'\ 

Let us substitute on the right-hand side of the equality [p'\ = the ex- 

pression [P] = A[a], and on the left-hand side, the expression [/T] = A! fa'] = 
A'C~ [ [a], as a resuit of which we obtain the relationship 

A'C~ l [a] = D~ l A[a], (3.39) 

This line of argument holds for any vector x g L, and hence equality (3.39) holds 
for any column vector [a ] of length n. Clearly, this is possible if and only if we hâve 
the equality 

A'C~ l =D~ l A. (3.40) 

Indeed, both matrices A'C~ l and D~ { A are of type (m,n), and if they were not 
equal, then there would be at least one row (with index i between 1 and n) and 
one column (with index j between 1 and m) such that the i j th éléments of the 
matrices A' C -1 and D~ [ A did not coincide. But then one could easily identify a 
column vector [a] for which the equality (3.39) was not satisfied. For example, set 
its element olj equal to 1, and ail the rest to zéro. 

Let us note that we could hâve obtained formula (3.40) by considering the tran- 
sition from one basis to another as a linear transformation of vector spaces given 
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by multiplication by the transition matrix (see Remark 3.57 above). Indeed, in this 
case, we obtain the following diagram, in which each arrow indicates multiplication 
of a column vector by the matrix next to it: 



By the définition of matrix multiplication, from the vector [a], we can obtain the 
vector [/T] located in the opposite corner of the diagram in two ways: multiplication 
by the matrix A' C -1 and multiplication by the matrix D~ { A. Both methods should 
give the same resuit (in such case, we say that the diagram is commutative , and this 
is équivalent to equality (3.40)). 

We can multiply both sides of (3.40) on the right by the matrix C, obtaining as a 
resuit 

A’ = D~ l AC, (3.41) 

which is called the formula for a change of matrix of a linear transformation. 

In the case that the dimensions n and m of the vector spaces L and M coincide, 
both matrices A and A' are square (of order n = m), and for such matrices, one has 
the notion of the déterminant. Then by Theorem 2.54, from formula (3.41), there 
follows the relationship 



D~ i \-\A\-\C\ = \D\~ l -\A\-\C\. 


(3.42) 


Since C and D are transition matrices, they are nonsingular, and therefore the dé- 
terminants \A'\ and \A \ differ from each other through multiplication by the number 
\D\~ [ \C\ 0. This indicates that if the matrix of a linear transformation of spaces 

of the same dimension is nonsingular for some choice of bases, then it will be non- 
singular for any other choice of bases for these spaces. Therefore, we may make the 
following définition. 


Définition 3.59 A linear transformation of spaces of the same dimension is said to 
be nonsingular if its matrix (expressed in terms of some choice of bases of the two 
spaces) is nonsingular. 


There is a spécial case, which is of greatest importance for a variety of applica- 
tions to which Chaps. 4 and 5 will be devoted, in which the spaces L and M coincide 
(that is, A. is a linear transformation of a vector space into itself and so n — m), 
the basis e \ , . . . , e n coincides with the basis /j , . . . , f m , and the basis e\ , . . . , e’ n 
coincides with f \, . . . , f' m . Consequently, in this case, D — C, and the change-of- 
matrix formula (3.41) is converted to 


A! — C -1 AC, 


(3.43) 
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and équation (3.42) assumes the very simple form \A'\ = \A\. This means that al- 
though the matrix of a linear transformation of a vector space L into itself dépends 
on the choice of basis, its déterminant does not dépend on the choice of basis. This 
circumstance is frequently expressed by saying that the déterminant is invariant un- 
der a linear transformation of a vector space into itself. In this case, we may give the 
following définition. 

Définition 3.60 The déterminant of a linear transformation A : L —> L of a vector 
space to itself (denoted by | A\ ) is the déterminant of its matrix A, expressed in terms 
of any basis of the space L, that is, \A\ = \A\. 


3.5 Isomorphisms of Vector Spaces 

In this section we shall investigate the case in which a linear transformation A : L -> 
M is a bijection. We observe first of ail that if A is a bijective linear transformation 
from L to M, then like any bijective mapping (not necessarily linear), it has an inverse 

mapping A~ [ : M — > I It is clear that A -1 will also be a linear transformation 

from M to L. Indeed, if for the vector y { e M there is a unique vector x\ g L such 
that A(jc i ) = >q, and for y 2 G M there is an analogous vector x 2 G L such that 
A(jci + X 2 ) — Ji + y 2 ’ ^en by the définition of inverse mapping, we obtain the 
first of conditions (3.21) in the définition of a linear transformation: 

+ y 2) = *i +*2 = <A _ 1 Oi) + A~\y 2 )- 

Similarly, but even more simply, we can verify the second condition of (3.21), that 
is, that A -1 (a y) — aA _1 (j) for an arbitrary vector y G M and scalar a. 

Définition 3.61 Vector spaces L and M between which there exists a bijective linear 
transformation A are said to be isomorphic , and the transformation A itself is called 
an isomorphism. The fact that vector spaces L and M are isomorphic is denoted by 
L ~ M. If we wish to specify a concrète transformation A : L M that produces the 
isomorphism, then we write A : L^> M. 

The property of being isomorphic defines an équivalence relation on the set of 
ail vector spaces (see the définition 011 p. xii). To prove this, we need to verify three 
properties: reflexivity, symmetry, and transitivity. Reflexivity is obvious: we hâve 
simply to consider the identity mapping 8 : L. Symmetry is also obvious: if 

A : M, then the inverse transformation A -1 is also an isomorphism, that is, 

A -1 : M — ► L. Finally, if A : L— ► M and 33 : N, then, as is easily verified, the 

transformation C = 33 A is also an isomorphism, that is, C : L— ► N, which estab- 
lishes transitivity. Therefore, the set of ail vector spaces can be represented as a 
collection of équivalence classes of vector spaces whose éléments are mutually iso- 
morphic. 
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Example 3.62 With the choice of basis e\, ... ,e n in a vector space L over a field 
K, assigning to a vector x e L the row consisting of its coordinates in this basis 
establishes an isomorphism between L and the row space K n . Similarly, the éléments 
of a row in the form of a column produces an isomorphism between the row space 
and the column space (with rows and columns containing the same numbers of 
éléments). This explains why we use a single Symbol for denoting these spaces. 

Example 3.63 Through the sélection of bases e\, ... ,e n and / j, . . . , f m in the 
spaces L and M of dimensions n and m, we assign to each linear transformation 
A : L —> M its matrix A. We thereby establish an isomorphism between the space 
£(L, M) and the space of rectangular matrices of type (m, n). 

Theorem 3.64 Two finite-dimensional vector spaces L and M are isomorphic if and 
only if dim L = dim M. 

Proof The fact that ail vector spaces of a given finite dimension are isomorphic 
follows easily from the fact that every vector space L of finite dimension n is iso- 
morphic to the space K 11 of rows or columns of length n (Example 3.62). Indeed, 
let L and M be two vector spaces of dimension n . Then L ~ W 1 and M ~ W 1 , from 
which as a resuit of transitivity and symmetry, we obtain L ~ M. 

We now prove that isomorphic vector spaces L and M hâve the same dimension. 
Let us assume that A : L M is an isomorphism. Let us dénoté by 0 e L and O'gM 
the null vectors in the spaces L and M. Recall, by the property of linear transforma- 
tions that we proved on p. 102, that <A(0) = O 7 . Let dimM = m, and let us choose 
in M some basis f ... , f m . By the définition of isomorphism of a vector space L, 
there exist vectors e\ , . . . , e m such that f t — A (et) for i = 1, . . . , m. We shall prove 
that the vectors e\, ... ,e m form a basis of the space L, whence it will follow that 
dim L = m , completing the proof of the theorem. 

Lirst of ail, let us show that these vectors are linearly independent. Indeed, if 
e \ , . . . , e m were linearly dépendent, then there would exist scalars a\, ... , a m , not 
ail equal to zéro, such that 

oi\e\ + « 2^2 H h 0 L m e m — 0. 

But after applying the linear transformation A to both parts of this relationship, in 
view of the equality ^(0) = 0\ we would obtain 

a lf[ + ^ 2/2 + 1" (Xmfm ~ 0 r , 

from which follows oq =0, ..., a m = 0, since by assumption, the vectors 
fi,..., f m are linearly independent. 

Let us now prove that every vector x e L is a linear combination of the vectors 
e\, ... , e m . Let us set <A(x) = y and express y in the form 


y — a lf\ + ^2/2 d b a mf nr 
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Applying to both sides of this equality the linear transformation A 1 , we obtain 


x = a\e\ + a. 2 e 2 H h ot m e m , 

as required. We hâve thus shown that the vectors e \, . . . , e m form a basis of the 
vector space L. □ 

Example 3.65 Suppose we are given a System of m homogeneous linear équations 
in n unknowns x\ , . . . , x n and with coefficients in the field K. As we saw in Exam- 
ple 3.8 (p. 84), its solution forms a subspace L' of the space K 11 of rows of length n. 
Since we know that the dimension of the space K n is n , it follows that dimL' < n. 
Let us détermine this dimension. To this end, using Theorem 1.15, let us bring our 
System into échelon form (1.18). Since the équations of the original System are ho- 
mogeneous, it follows that in (1.18), ail the équations will also be homogeneous, 
that is, ail the constant terms b, are equal to 0. Let r be the number of principal un- 
knowns, and hence (n — r) is the number of free unknowns. As shown following the 
proof of Theorem 1.15, we shall obtain ail the solutions of our System by assigning 
arbitrary values to the free unknowns and then determining the principal unknowns 
from the first r équations. That is, if (x \ , . . . , x n ) is some solution, then comparing 
to it the row of values of the free unknowns (xj l , . . . , Xi n _ r ), we obtain a bijection 
between the set of solutions of the System and rows of length n — r. An obvious 
vérification shows that this relationship is an isomorphism of the spaces K n ~ r and 
L'. Since dimlK w-r — n — r, then by Theorem 3.64, the dimension of the space L' 
is also equal to n — r. Finally, we observe that the number r is equal to the rank of 
the matrix of the System (see Sect. 2.8). Therefore, we hâve obtained the following 
resuit: the space of solutions of a homogeneous linear System of équations has di- 
mension n — r, where n is the number of unknowns, and r is the rank of the matrix 
of the System. 

Let A : L— ► M be an isomorphism of vector spaces L and M of dimension n , 
and let e \, . . . , e n be a basis of L. Then the vectors A(e \ ),..., A(e n ) are linearly 
independent. Indeed, if not, we would hâve the equality 


a\ A(e\)-\ I -a n A(e n ) = A(a\e i H b ot n e n ) = 0 ', 


from which by the property <A(0) = O 7 and that fact that A) is a bijection, we obtain 

the relationship a\e\ H -\-ot n e n = 0 , contradicting the définition of basis. Hence 

the vectors A(e \), . . . , A(e n ) form a basis of the vector space M. It is easy to see that 
in these bases, the matrix of the transformation A is the identity matrix of order n , 
and the coordinates of an arbitrary vector x e L in the basis e \ , . . . , e n coincide with 
the coordinates of the vector A>(x) in the basis A(e \), . . . , A(e n ). Consequently, the 
transformation A in nonsingular. 

A similar argument easily establishes the converse fact that an arbitrary nonsin- 
gular linear transformation A : L — ► M of vector spaces of the same dimension is an 
isomorphism. 
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Remark 3.66 Theorem 3.64 shows that ail assertions formulated in terms of con- 
cepts entering the définition of a vector space are équivalent for ail spaces of a given 
dimension. In other words, there exists a single , unique theory of n-dimensional 
vector spaces for a given n. An example of the opposite situation can be found 
in Euclidean geometry and the non-Euclidean geometry of Lobachevsky. It is well 
known that if we accept ail the axioms of Euclid except for the “parallel postulate” 
(so-called ab soluté geometry ), then there are two completely different geometries 
that satisfy these axioms: Euclid’s and Lobachevsky’ s. With vector spaces, such a 
situation does not arise. 

The définition of an isomorphism under the linear transformation A : L M 
consists of two parts. The first asserts that for an arbitrary vector y g M, there ex- 
ists a vector x g L such that <>4>(x) = y, that is, the image <A(L) coincides with the 
entire space M. The second condition is that the equality ,A(xi) = <A(X 2 ) holds only 
for x\ = X 2 - Since A is a linear transformation, then for the latter condition to be 
satisfied, it is necessary that the equality eA(x) = 0' imply x = 0. This motivâtes the 
following définition. 

Définition 3.67 The set of vectors in the space L such that eA(x) = 0' is called the 
kernel of the linear transformation A. 5 In other words, the kernel is the preimage of 
the null vector under the mapping A. 

It is obvious that the kernel of a linear transformation A : L -> M is a subspace 
of L, and that its image <A(L) is a subspace of M. 

Thus to satisfy the second condition in the définition of a bijection, it is necessary 
that the kernel A consist of the null vector alone. But this condition is sufficient as 
well. Indeed, if for vectors x\ X 2 the condition eA(xi) = eA(x 2 ) is satisfied, then 
subtracting one side of the equality from the other and applying the linearity of the 
transformation A, we obtain e>4>(xi — X 2 ) = 0', that is, the vector x\ — X 2 is in the 
kernel of A. Therefore, the linear transformation A : L — ► M is an isomorphism if 
and only if its image coincides with ail of M and its kernel is equal to (0) . We shall 
now show that if A is a linear transformation of spaces of the same finite dimen- 
sion, then an isomorphism results if either one or the other of the two conditions is 
satisfied. 

Theorem 3.68 If A : L —> M is a linear transformation of vector spaces ofthe same 
finite dimension and the kernel of A is equal to (0), then A is an isomorphism. 

P roof Let dimL = dimM = n. Let us consider a particular basis e\, ... ,e n of the 
vector space L. The transformation A maps each vector et to some vector /, = 
A(ei) of the space M. Then the vectors / 1 , . . . , f n are linearly independent, that is, 


5 Translatons note: Another name for kernel that the reader may encounter is null space (since the 
kernel is the space of ail vectors that map to the null vector). 
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they form a basis of the space M. Indeed, from the linearity of the transformation A, 
for arbitrary scalars a \ , . . . , a n , we hâve the equality 


A(ot\e\ H \-oi n e n ) = aifi H Voi n f n . (3.44) 

If ai fi + Y oL n f u — 0' for some collection of scalars a\, ... ,a n , then from the 

condition that the kernel of A is equal to (0), we will hâve a yey + Y a n e n = 0, 

from which it follows, by the définition of a basis, that ail the scalars a/ are equal 
to zéro. The relationship (3.44) also shows that the transformation A maps each 
vector x e L with coordinates (ai , ... ,a n ) in the basis e\, ... ,e n into the vector M 
with the same coordinates in the corresponding basis (the matrix of the 

transformation A in such bases is the identity matrix of order n). 

By the définition of an isomorphism, it suffices to prove that for an arbitrary 
vector y e M, there exists a vector x e L such that e>4>(jt) = y. Since the vectors 
fi,..., f n form a basis of the space M, it follows that y can be expressed as a linear 
combination of these vectors with certain coefficients (oq, . . . , a n ), from which by 
the linearity of A it follows that 

J = “l/l H b “« /« = «A(“i*i H b “«<?„) = A{x) 

with vectors x = a\ey H Y a n e n , which complétés the proof of the theorem. □ 

Theorem 3.69 If A : L —> M is a linear transformation of vector spaces ofthe same 
finite dimension and the image of A (L) is equal to M, then A is an isomorphism. 

Proof Let f i , . . . , f n be a basis of the vector space M. By the condition of the 
theorem, for each /,, there exists a vector e, g L such that /,- — A(e /). We shall 
show that the vectors e\, ... ,e n are linearly independent and therefore form a basis 

of L. Indeed, if there existed a collection of scalars a\, . . . , a n such that ct\e\ H Y 

a n e n = 0, then by eA(O) = 0' and the linearity of A, we would hâve the equality 


A(a\e\ -Y Y oi fl e n ) — o?i eA(^i) + • • • + ot n A(e n ) — a\f y -Y • • • + °infn ~ 


from which by the définition of basis it would follow that a, = 0. That is, the vectors 
e\, ... ,e n indeed form a basis of the space L. 

It follows from the définition of a basis that an arbitrary vector x e L can be 
written as x = a\e\ H Y oi n e n . From this, we obtain 

cA(jc) = A(a \e\ H Y oi n e n ) = a\A(e\) -Y ■ • • + a n A(e„) 

— a lf{ H \~ a nf n - 

If eAOc) = 0', then we hâve u\f y + Y ot n f n = 0', which is possible only if ail 

the oti are equal to 0, since the vectors f y, ... , f n form a basis of the space M. But 

then, clearly, the vector x = aye y Yot n e n equals 0. Therefore, the kernel of the 

transformation A consists solely of the null vector, and by Theorem 3.68, A is an 
isomorphism. □ 
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It is not difficult to see that the theorems proved just above give us the following 
resuit. 

Theorem 3.70 A linear transformation A : L M between vector spaces of the 
saine finite dimension is an isomorphism if and only if it is nonsingular. 

In other words, Theorem 3.70 asserts that for spaces of the same finite dimension, 
the notion of a nonsingular transformation coincides with that of isomorphism. 

With the proof of Theorem 3.68 we hâve also established one important fact: 
a nonsingular linear transformation A : L — ► M of vector spaces of the same finite 
dimension maps a basis e \ , . . . , e n of the space L to a basis / j , . . . , f n of the space 
M, and every vector x e L with coordinates (a\, . . . ,a n ) in the first basis is mapped 
to the vector A(x) e M with the same coordinates relative to the second basis. This 
clearly follows from formula (3.44). 

Thus it is possible to define a nonsingular transformation A : L — > M by stating 
that it maps a particular basis e \ , . . . , e n of the space L into a basis /j , . . . , f n of the 
space M, and an arbitrary vector x e L with coordinates (ai, . . . , ct n ) with respect 
to the basis e i , . . . , e n into the vector of M with the same coordinates with respect 
to the basis /i, . . . , f n . Later, we will make use of this method in the case L = M, 
when we will be studying certain spécial subsets XcL, primarily quadrics. The 
basic idea is that subsets X and Y are mapped into each other using a certain non- 
singular mapping A : L — ► L (that is, Y = A(X)) if and only if there exist two bases 
e\ , . . . , e n and f x , . . . , f n of the vector space L such that the condition of the vector 
x belonging to the subset X in coordinates relative to the basis e \ , . . . , e n coincides 
with the condition of the same vector belonging to Y in coordinates relative to the 
basis 

In conclusion, let us return once more to Theorem 1.12, proved in Sect. 1.2, and 
Corollary 1.13 (Fredholm alternative; see p. 11). This theorem and corollary are 
now completely obvious, obtained as trivial conséquences of a more general resuit. 

Indeed, as we saw in Sect. 2.9, a System of n linear équations in n unknowns can 
be written in matrix form A[x] = [b], where A is a square matrix of order n, [x] is 
a column vector consisting of the unknowns x \ , . . . , x n , and [b] is a column vector 
consisting of the constants b\, . . . , b n . Let A : L -> M be a linear transformation 
between vector spaces of the same dimension n, having for some bases e \, . . . , e n 
and f , f n , the matrix A. Let b e M be the vector whose coordinates in the 
basis f , f n are equal to b\, . . . , b n . Then we can interpret the linear System 
A[x] = [b] as équations 

A(x) — b (3.45) 

with the unknown vector x e L whose coordinates in the basis e \ , . . . , e n give the 
solution (jci , . . . , x n ) to this System. 

We hâve the following obvious alternative: Either the linear transformation 
A : L —> M is an isomorphism, or else it is not. By Theorem 3.70, the first case 
is équivalent to the mapping A being nonsingular. Then the kernel of A is equal to 
(0), and we hâve the image A(L) — M. Consequently, for an arbitrary vector b e M, 
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there exists (and indeed, it is unique) a vector x e L such that e>4>(jt) — b, that is, 
équation (3.45) is solvable. In particular, from this we obtain Theorem 1.12 and its 
corollary. In the second case, the kernel of A contains a nontrivial vector (the asso- 
ciated homogeneous System has a nontrivial solution), and the image <A(L) is not ail 
of the space M, that is, there exists a vector b e M such that équation (3.45) is not 
satisfied (the System A[x\ = [b] is inconsistent). 

This assertion, that either équation (3.45) has a solution for every right-hand side 
or the associated homogeneous équation has a nontrivial solution, holds also in the 
case that A is a linear transformation (operator) in an infinite-dimensional space 
satisfying a certain spécial condition. Such transformations occur in particular in 
the theory of intégral équations, where this assertion is given the name Fredholm 
alternative. 


3.6 The Rank of a Linear Transformation 

In this section we shall look at linear transformations A : L -> M without mak- 
ing any assumptions about the dimensions n and m of the spaces L and M except 
to assume that they are finite. We note that if e \, . . . , e n is any basis of the space 
L, then the image of A is equal to (e>4>(ei), . . . , A(e n )). If we choose some basis 
fl,..., f m of the space M and write the matrix of the transformation A with re- 
spect to the chosen bases, then its columns will consist of the coordinates of the 
vectors A(e i), . . . , A(e n ) in the bases f i, ... , f m , and therefore, the dimension 
of the image of A is equal to the greatest number of linearly independent vectors 
among these columns, that is, the rank of the matrix of the linear transformation A. 
Thus the rank of the matrix of a linear transformation is independent of the bases 
in which it is written, and therefore, we may speak of the rank of a linear trans- 
formation. This allows us to give an équivalent définition of the rank of a linear 
transformation that does not dépend on the choice of coordinates. 

Définition 3.71 The rank of a linear transformation A : L -> M is the dimension of 
the vector space eA(L). 


The following theorem establishes a connection between the rank of a linear 
transformation and the dimension of its kernel, and it shows a very simple form into 
which the matrix of a linear transformation A : L -> M can be brought through a 
suitable choice of bases of both spaces. 


Theorem 3.72 For any linear transformation A : L — ► M offinite-dimensional vec- 
tor spaces , the dimension ofthe kernel of A is equal to dim L — r , where r is the rank 
of A. In the two spaces , it is possible to choose bases in which the transformation 
A has a matrix in block- diagonal form 




(3.46) 


where E r is the identity matrix oforder r. 
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P roof Let us dénoté the kernel of the transformation A by L, and its image A( L) 
by IVf. We begin by proving the relationship 

dim L' + dim M' = dim L. (3 .47) 

B y the définition of the rank of a transformation, we hâve here r = dim M', and thus 
the equality (3.47) gives precisely the first assertion of the theorem. 

Let us consider the mapping A' : L —> W\' that assigns to each vector x e L the 
vector y — cA(x) in M r , which by assumption is the image of the mapping A : 
L -> M. It is clear that such a mapping A' : L M' is also a linear transformation. 
In view of Corollary 3.31, we hâve the décomposition 

L = L/®L", (3.48) 

where L" is some subspace of L. We now consider the restriction of the transforma- 
tion A' to the subspace L" and dénoté it by A" : L" — > M'. It is easily seen that the 
image of A" coincides with the image of A', that is, is equal to M r . Indeed, since 
M 7 is the image of the original mapping A : L -> M, every vector y e M' can be rep- 
resented in the form y — <A>(x) with some x g L. But in view of the décomposition 
(3.48), we hâve the equality x = u + v, where u el! and r e L", and moreover, L' 
is the kernel of A, that is, A(u) = 0'. Consequently, ^(x) = *A(m) + A(v) = A(v), 
and this means that the vector y = A(v) is the image of the vector i; e L". 

The kernel of the transformation A" : L" M' is equal to (0). Indeed, by défini- 
tion, the kernel is equal to L H L", and this intersection consists solely of the null vec- 
tor, since on the right-hand side of the décomposition (3.48) is to be found a direct 
sum (see Corollary 3.15). As a resuit, we obtain that the image of the transformation 
A" : L" M' is equal to M', while its kernel is equal to (0), that is, this transfor- 
mation is an isomorphism. By Theorem 3.64, it follows that dim L" = dimM'. On 
the other hand, from the décomposition (3.48) and Theorem 3.41, it follows that 
dimL + dim L" = dimL. Substituting here dim L" by the equal number dimM', we 
obtain the required equality (3.47). 

We shall now prove the assertion of the theorem about bringing the matrix of a 
linear transformation A into the form (3.46). To this end, similar to the décompo- 
sition (3.48) of the space L, we make the décomposition M = M' 0 M", where M" 
is some subspace of M. By the fact proved above that dimL' = n — r and in view 
of (3.48), it follows that dimL' = r. Let us now choose in the subspace L" some 
basis Mi, ... , u r and set v, = A"(Ui ), that is, by définition, v/ = A(uj). As we hâve 
seen, the transformation A" : L n —*■ W\' is an isomorphism, and therefore, the vectors 
v \, . . . , v r form a basis of the space M', and moreover, in the bases mi, . . . , u r and 
v \, . . . , v r , the transformation A" has the identity E, as its matrix. 

Let us now choose in the space L some basis M r +i , . . . , u n and combine it with 
the basis Mi, . . . , u r into the unified basis u \, . . . , u n of the space L. Similarly, we 
extend the basis v \ , . . . , v r to an arbitrary basis v \ , V 2 , . . . , v m of the space M. What 
will be the matrix of the linear transformation A in the constructed bases u \ , . . . , u n 
and v\ , ,v m l It is clear that A(ui) — Vj for i = 1, . . . , r (by construction, for 
these vectors, the transformation A" is the same as A). 
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On the other hand, A(Uj) = (F for i = r + 1 , . . . , n, since the vectors u r +\ , . . . , u n 
are contained in the kernel of A. Writing the coordinates of the vectors A>(mi), . . . , 
A(u n ) in the basis v \, . . . , v m as the columns of a matrix, we obtain that the matrix 
of the transformation A has the block-diagonal form (3.46). □ 

Theorem 3.72 allows us to obtain a simpler and more natural proof of Theo- 
rem 2.63 from the previous section. 

To this end, we note that every matrix is the matrix of some linear transfor- 
mation of vector spaces of suitable dimensions, and in particular, a nonsingular 
square matrix represents an isomorphism of vector spaces of the same dimension. 
For the matrices A, B, and C of Theorem 2.63, let us consider the linear transfor- 
mations A : IVf, S : L — ► L, and C : L -> M, where dimL = dimL' = n and 

dim M = dim IVf = m, having matrices A, B, and C in some bases. 

Let us find the rank of the transformation AC£ : L —> M'. From the equalities 
A( M) = IVf and bB(L') = L, it follows that AC£(L') = A(G( L)), whence taking into 
account the isomorphism A , we obtain that dim e A(2S(L / ) = dimC(L). By défini- 
tion, the dimension of the image of a linear transformation is equal to its rank, which 
coincides with the rank of its matrix, written in terms of arbitrary bases, from which 
it follows that rk AC B = rkC. From this, we finally obtain the required equality 
rk AC B — rk C. 

We would like to emphasize that the matrix of a transformation is reduced to the 
simple form (3.46) in the case that the spaces L and M are different from each other, 
and it follows that there is no possibility of coordinating their bases, and they are 
thus chosen independently of each other. We shall see below that in other cases (for 
example, if L = M), there is a more natural way of making this assignment when the 
bases of the spaces L and M are not chosen independently (for example, in the case 
L = M, it is simply one and the same basis). Then the question of the simplest form 
of the matrix of a transformation becomes much more complex. 

The statement of Theorem 3.72 on bringing the matrix of a linear transformation 
into the form (3.46) can be reformulated. As we established in Sect. 3.4 (substitution 
formula (3.41)), under a change of bases in the spaces L and M, the matrix A of a 
linear transformation A : L M is replaced by the matrix A' — D~ [ AC, where C 
and D are the transition matrices for the new bases in the spaces L and M. We know 
that the matrices C and D are nonsingular, and conversely, any nonsingular square 
matrix of the appropriate order can be taken as the transition matrix to a new basis. 
Therefore, Theorem 3.72 yields the following corollary. 

Corollary 3.73 For every matrix A of type ( m,n ), there exist nonsingular square 
matrices C and D of order s n and m such that the matrix Z) -1 AC has the form 
(3.46). 


3.7 Dual Spaces 

In this section, we shall examine the notion of a linear transformation A : L —> M in 
the simplest case of dim M = 1 . As a resuit, we shall arrive at a concept very close 
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to that with which we began our course in Sect. 1.1, but now reformulated more 
abstractly, in terms of vector spaces. If dimM = 1, then after selecting a basis in M 
(that is, some nonzero vector e ), we can express any vector in this space in the form 
ae, where a is a scalar (real, complex, or from an arbitrary field K, depending on the 
interprétation that the reader wishes to give to this term). Identifying ae with a , we 
may consider in place of M the collection of scalars (M, C, or K). In connection with 
this, we shall in this case dénoté the vector space £(L, M) introduced in Sect. 3.3 by 
£(L, K). It is called the space of linear functions on L. 

Therefore, a linear function on a space L is a mapping / : L — ► K that assigns to 
each vector x e L the number f (x) and satisfies the conditions 

f(x + y) = f(x) + f(y), f(ax) = ctf(x) 

for ail vectors x, y e L and scalars «gI. 

Example 3. 74 If L = K n is the space of rows of length n with éléments in the field 
K, then the notion of linear function introduced above coincides with the concept 
introduced in Sect. 1.1. 

Example 3. 75 Let L be the space of continuous functions on the interval [a , b] tak- 
ing real or complex values. For every function x(t) in L, we set 

f<p(x)= [ <p(t)x(t ) dt , (3.49) 

where <p(t ) is some fixed function in L. It is clear that f ^ (x) is a linear function on L. 
We observe that in going through ail functions (p{t), we shall obtain by formula 
(3.49) an infinité number of linear functions on L, that is, éléments of the space 
£(L, K), where K = R or C. However, it is not possible to obtain ail linear functions 
on L with the help of formula (3.49). For example, let s e [a, b] be some fixed point 
on this interval. Consider the mapping L —> K that assigns to each function x(t) G L 
its value at the point s . It is then clear that such a mapping is a linear function on L, 
but it is represented in the form (3.49) for no function <p(t). 

Définition 3.76 If L is finite-dimensional, the space £(L, K) is called the dual to L 
and is denoted by L* . 

Remark 3.77 (The infinite-dimensional case) For an infinite-dimensional vector 
space L (for example, that considered in Example 3.75 of the space of continu- 
ous functions on an interval), the dual space L* is defined to be the space not of ail 
linear functions, but only of those satisfying the particular additional condition of 
continuity (in the case of a finite-dimensional space, the requirement of continuity 
is automatically satisfied). 

The study of linear functions on infinite-dimensional vector spaces turns out to 
be useful in many questions in analysis and mathematical physics. In this direction, 
the remarkable idea arose to treat arbitrary linear functions as if they had been given 
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in the form (3.49), where <p(t) is a certain “generalized function” that does not, in 
general, belong to the initial space L. This leads to new and interesting results. 

For example, if we take as L the space of functions that are différentiable on the 
interval [a, b] and equal to zéro at the endpoints, then for a différentiable function 
the rule of intégration by parts can be written in the form / ^(x) — — f ^{x'). 
But if the dérivative does not exist, then it is possible to define a new, “general- 
ized,” function i/r(t ) by f^(x) = — In this case, it is clear that i/r(t) — <p'(t) 

if the dérivative exists and is continuous. Thus it is possible to define dériva- 
tives of arbitrary functions (including discontinuous and even generalized func- 
tions). 

For example, let us suppose that our interval [a, b] contains in its interior the 
point 0 and let us calculate the dérivative of the function h(t) that is equal to zéro 
for* < 0 and to 1 for t > 0, and consequently has a discontinuity at the point t — 0. 
By définition, for any function x(t) in L, we obtain the equality 

fh'( x ) = ~fh{ x ') = ~ f h{t)x\t)dt = - [ x'(t)dt = x(0)-x(b) = x(0), 

J a J 0 

since x(b) — 0. Consequently, the dérivative h'(t) is a generalized function 6 that 
assigns to each function x(t) in L its value at the point t — 0. 

We now return to exclusive considération of the finite-dimensional case. 

Theorem 3.78 If a vector space L is offinite dimension , then the dual space L* has 
the saine dimension. 

Proof Let e \ , . . . , e n be any basis of the space L. Let us consider vectors /, g L*, 
i = 1, . . . , n, where f t is defined as a linear function that assigns to a vector 


x = oc\e\ -b « 2^2 H h a n e n (3.50) 

its i th coordinate in the basis e \ , . . . , e n , that is, 


fi(x) = a i, ..., f n (x) = a n . (3.51) 

We will thus obtain n vectors in the dual space. Let us verify that they form a basis 
of that space. 

Let / = Pi fi + ••• + fin f n • Then applying the function / to the vector x, 
defined by the formula (3.50), we obtain 


f(x) = ai fil +0i 2 fi2 H Vcinfin- (3.52) 


6 Such a generalized function is called a Dirac delta function in honor of the English physicist Paul 
Adrien Maurice Dirac, who was the first to use generalized functions (toward the end of the 1920s) 
in his work on quantum mechanics. 
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In particular, assuming x — £/, we obtain that f(e{) — Pi. Thus the equality / = 0 
(where 0 is the null vector of the space L*, that is, a linear function on L identically 
equal to zéro) means that /(x) = 0 for every vector x g L. It is clear that this is 
the case if and only if /fi = 0, . . . , p n = 0. By this we hâve established the linear 
independence of the functions /fi, By equality (3.52), every linear function 

on L can be expressed in the form P\f \ H b P n f n with coefficients Pi — f(e{). 

This means that the functions / 1 , . . . , f n form a basis of L* , from which it follows 
that dim L = dim L* = n . □ 

The basis / 1 , . . . , f n of the dual space L* constructed according to formula 
(3.51) is called the dual to the basis ei, ... , e n of the original vector space L. It 
is clear that it is defined by the formula 

fi(ei)=l, fi(ej) = 0 for j^i. 

We observe that L and L*, like any two finite-dimensional vector spaces of the 
same dimension, are isomorphic. (For infinite-dimensional vector spaces, this is not 
in general the case, as in the case examined in Example 3.75 of the space L of con- 
tinuons functions on an interval, for which L and L* are not isomorphic.) However, 
the construction of an isomorphism between them requires the choice of a basis 
e\ , . . . , e n in L and a basis /fi , / 2 , . . . , f n in L*. Thus between L and L* there does 
not exist a “natural” isomorphism independent of the choice of basis. If we repeat 
the process of passage to the dual space twice, we will obtain the space (L*)*, for 
which it is easy to construct an isomorphism with the original space L without re- 
sorting to the choice of a spécial basis. The space (L*)* is called the second dual 
space to L and is denoted by L**. 

Our immédiate objective is to define a linear transformation A : L -> L** that is 
an isomorphism. To do so, we need to define <A(x) for every vector x g L. The vector 
<A(x) must lie in the space L**, that is, it must be a linear function on the space L*. 
Since <A(x) is an element of the second dual space L**, it follows by définition that 
cA(x) is a linear transformation that assigns to each element /g L* (which itself 
is a linear function on L) some number, denoted by A >(x)(/). We will define this 
number by the natural condition 

A(x)(f) = /(x ) for ail x g L, / G L*. (3.53) 

The transformation A is in £(L, L**) (its linearity is obvious). To verify that A 
is a bijection, we can use any basis e \, . . . , e n in L and the dual basis /fi, . . . , f n 
in L*. Then, as is easy to verify, A is the composition of two isomorphisms: the 
isomorphism L^> L* constructed in the proof of Theorem 3.78 and the analogous 
isomorphism L* L**, whence it follows that A is itself an isomorphism. 

The isomorphism L** determined by condition (3.53) shows that the vector 
spaces L and L* play symmetric rôles: each of them is the dual of the other. To point 
out this symmetry more clearly, we shall find it convenient to write the value /(x), 
whereby x G L and / g L*, in the form (x, /). The expression (x, /) possesses the 
following easily verified properties: 
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1. (Xi+X 2 ,/) = (Xi,/) + (x 2 ,/); 

2. (X, /i + / 2 ) = (x, f x ) + (x, / 2 ); 

3. (m, /) = a(x, /); 

4. (x,a/) = œ(x, /); 

5. if (x, /) = 0 for ail x e L, then f — 0; 

6. if (x, /) = 0 for ail / e L*, then x = 0. 

Conversely, if for two vector spaces L and M, the function (x , y) is defined, where 
x g L and y e M, taking numeric values and satisfying conditions (l)-(6), then as is 
easily verified, L ~ M* and M ~ L*. We shall rely heavily on this fact in Chap. 6 in 
our study of bilinear forms. 

Définition 3.79 Let L be a subspace of the vector space L. The set of ail / g L* 
such that /(x) = 0 for ail x g L' is called the annihilator of the subspace L ' and is 
denoted by (L') fl . 

It follows at once from this définition that ( L') a is a subspace of L*. Let us déter- 
mine its dimension. Let dim L — n and dim L' = r. We choose a basis e\, ... ,e r of 
the subspace L, extend it to a basis e \ , . . . , e n of the entire space L, and consider the 
dual basis f , f n of L*. From the définition of the dual basis, it follows easily 
that a linear function /g L* belongs to (L') a if and only if f e (/ r+1 , . . . , fn)- In 
other words (L/) fl = (f r + 1 > •••,/„), and this implies that 

dim(L , ) a = dim L — dim L. (3.54) 

If we now consider the natural isomorphism L** L defined above and with its 
help identify these spaces, then it is possible to apply the construction given above 
to the annihilator (L') a and examine the obtained subspace ({l!) a ) a in L. From the 
définition, it follows that L C ((L') fl ) fl . From the derived relationship (3.54) for 
dimension, we obtain that dim((L) a ) r/ = n — {n — r) — r, and by Theorem 3.24, it 
follows that (( \_!) a ) a — L. 

At the same time, we obtain that the subspace L consists of ail vectors x g L for 
which 


/r +1 (x) = 0, fn(x) = 0. (3.55) 

Thus an arbitrary subspace L is defined by some System of linear équations (3.55). 
This fact is well known in the case of fines and planes (dim L = 1,2) in three- 
dimensional space from courses in analytic geometry. In the general case, this as- 
sertion is the converse of what was proved in Example 3.8 (p. 84). 

We hâve defined the correspondence L i— ► (L') a between subspaces L C L and 
(L') fl C L*, which in view of the equality (( \J) a ) a — L' is a bijection. We shall dénoté 
this correspondence by e and call it duality. Let us now point out some simple 
properties of this correspondence. 

If L' and L" are two subspaces of L, then 




(3.56) 
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In other words, this means that 

(L , + L") fl = (L , ) fl n(L") fl - (3-57) 

Indeed, let / g (L') a fl (L") a . By the définition of sum, for every vector x g L + L " 
we obtain the représentation x — x' + x" , where x' e L and x" g L" , whence it fol- 
lows that /(x) = f(x') + /(x") — 0, since / g (L') a and / g ( L") a . Consequently, 
/ G (L' + and thus we hâve proved the inclusion (L/) fl fl ( L") a C (L' + L") a . 
Let us now prove the reverse inclusion. Let / g (L + that is, f (x) = 0 for 
ail vectors x = x' + x", where x' g L and x" g L"; in particular, for ail vectors 
in both subspaces L and L", that is, by the définition of the annihilator, we ob- 
tain the relationship / G (! L') a and / G (L") a . Thus / G (L') fl H (L") fl , that is, 
(L + L") fl C (L') a H (L") a . From this, by the previous inclusion, we obtain rela- 
tionship (3.57), and hence the relationship (3.56). 

As a resuit, we may formulate the following almost obvious duality principle. 
Later, we shall prove deeper versions of this principle. 


Proposition 3.80 (Duality principle) If for ail vector spaces of a given fuite dimen- 
sion n over a given field K, a theorem is proven in who se formulation there appear 
only the notions of subspace , dimension , sum , and intersection , then for ail such 
spaces , a dual theorem holds , obtained from the initial theorem via the following 
substitution : 


dimension r 
intersection L Pi L " 
sum L' + L" 


dimension n — r 
sum L + L" 
intersection L fl L" 


Finally, we shall examine the linear transformation A : L — > M. Here, as with 
ail functions, linear functions are written in reverse order to the order of the sets 
on which they are defined; see p. xv in the Introduction. Using the notation of that 
section, we define the set T = K and restrict the mapping 3(M, K) 5(U ŒQ con- 
structed there to the subset M* c 3XM, K), the space of linear functions on M. We 
observe that the image M* is contained in the space L* c 3XL, K), that is, it consists 
of linear functions on L. We shall dénoté this mapping by A*. According to the déf- 
inition on page xv, we define a linear transformation A* : M* L* by determining, 
for each vector g g M*, its value from the equality 

(«^* (£))(■*) — for ail x g L. (3.58) 

A trivial vérification shows that A * (g) is a linear function on L, and A * is a linear 
transformation of M* to L*. The transformation A * thus constructed is called the 
dual transformation of A. Using our earlier notation to write f(x) as (x, f), we 
can write the définition (3.58) in the following form: 

(«A* (y), x) = (y, A,(x)) for ail x g L and y g M*. 

Let us choose in the space L some basis e \ , . . . , e n , and in M, a basis f , f m , 
and also dual bases e *, . . . , e* in L* and / f . . . , /* in M*. 
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Theorem 3.81 The matrix of a transformation A : L -> M written in terms of ar- 
bitrary bases of the spaces L and M and the matrix of the dual transformation 
A * : M* -> L* written in the dual bases in the spaces M* and L* are transposes 
ofeach other. 

P roof Let A — (. aij ) be the matrix of the transformation A in the bases e\, ... , e n 
and fi, ... , f m . By formula (3.23), this means that 

m 

A ( e i ) = a ji f j , i = \,...,n. (3.59) 

7 — 1 

By the définition of the dual transformation (formula (3.58)), for every linear func- 
tion f e L*, the following equality holds: 

(X*(f))(ei) = f(Meù), i = 

If e \, . . . , e* is the basis of L* dual to the basis e\, ... , e n of L, and /*,..., /* is 
the basis of M* dual to the basis / j , . . . , f m of M, then A* (fl) is a linear function 
on L, as defined in (3.58). In particular, applying A* (fl) to the vector et G L, taking 
into account (3.58) and (3.59), we obtain 

( m \ m 

fî’Y, a j‘fj) = Y,Mfî’fj)’ 

7=1 / 7=1 

and this number is equal to a^i by the définition of the dual basis. It is obvious 
that this linear function on L is the function Y^!=i a kiC * • Thus we obtain that the 
transformation A * assigns the vector fl e M* to the vector 

n 

A*{fî) = ^2 a kie*, k=l,...,m, (3.60) 

i=i 

of the space L*. Comparing formulas (3.59) and (3.60), we conclude that in the 
given bases, the matrix of the transformation A* is equal to A* = (ap), that is, the 
transpose of the matrix of the transformation A. □ 

If we are given two linear transformations of vector spaces, A : L M and ^8 : 
M —> N, then we can define their composition SA : L —> N, which means that its 
dual transformation is also defined, and is given by (SA)* : N* —> L*. From the 
condition (3.58), an immédiate vérification easily leads to the relation 

(SA)* = A*S*. (3.61) 

Together with Theorem 3.81, we thus obtain a new proof of equality (2.57), and 
moreover, now no formulas are used; relationship (2.57) is obtained on the basis of 
general notions. 
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3.8 Forms and Polynomials in Vectors 

A natural generalization of the concept of linear function on a vector space is the 
notion of form. It plays an important rôle in many branches of mathematics and in 
mechanics and physics. 

In the sequel, we shall assume that the vector space L on which we want to 
define a form is defined over an arbitrary field K. In the space L, we choose a basis 
e\ , . . . , e n . Then every vector x e L is uniquely defined by the choice of coordinates 
(x \ , . . . , x n ) in the given basis. 

Définition 3.82 A function F : L -> K is called a polynomial on the space L if F (x) 
can be written as a polynomial in the coordinates x\ , . . . , x n of the vector x , that is, 
F (x) is a finite sum of expressions of the form 

Cx\ l ---Xn n , (3.62) 

where k \ , . . . , k n are nonnegative integers and the coefficient c is in K. The expres- 
sion (3.62) is called a monomial in the space L, while the number k = k\ H + k n 

is called its degree. The degree of F (x) is the maximum over the degrees of the 
monomials that enter into it with nonzero coefficients c. 

Let us note that for n > 1, a polynomial F (x) of degree k can hâve several differ- 
ent monomials (3.62) of the same degree entering into it with nonzero coefficients c. 

Définition 3.83 A polynomial F (x) on a vector space L is said to be homogeneous 
of degree k or a. form of degree k (or frequently k-fornï) if every monomial entering 
into F (x) with nonzero coefficients is of degree k. 

The définitions we hâve given require a bit of comment; indeed, we introduced 
them having chosen a particular basis of the space L, and now we need to show that 
everything remains as defined under a change of basis; that is, if the function F(x) is 
a polynomial (or form) in the coordinates of the vector x in one basis, then it should 
be a polynomial (or form) of the same degree in the coordinates of the vector x in 
any other basis. Indeed, using the formula for changing the coordinates of a vec- 
tor, that is, substituting relationships (3.35) into (3.62), it is easily seen that under a 
change of basis, every monomial (3.62) of degree k is converted to a sum of mono- 
mials of the same degree. Consequently, a change of basis transforms the monomial 
(3.62) of degree k into a certain form F' (x) of degree k' < k. The reason for the 
inequality here is that monomials entering into this form might cancel, resulting in a 
leading-degree term that is equal to zéro. However, it is easy to see that such cannot 
occur. For example, using back-substitution, that is, substituting relationship (3.37) 
into the form F' (x ), we will clearly again obtain the monomial (3.62). Therefore, 
k < k' . Thus we hâve established the equality k' — k. This establishes everything 
that we needed to prove. 

Forms of degree k — 0 are simply the constant functions, which assign to every 
vector x e L one and the same number. Forms of degree k = 1 are said to be linear , 
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and these are precisely the linear functions on the space L that we studied in detail 
in the previous section. 

Forms of degree k — 2 are called quadratic ; they play an especially important 
rôle in courses in linear algebra as well as in many other branches of mathematics 
and physics. In our course, an entire chapter will be devoted to quadratic forms 
(Chap. 6). 

We observe that we hâve in fact already encountered forms of arbitrary degree, 
as shown in the following example. 

Example 3.84 Let F (x ,x m ) be a multilinear function on m rows of length n 
(see the définition on p. 51). Since the space K n of rows of length n is isomorphic 
to every n-dimensional vector space, we may view F (x \, ... ,x m ) as a multilinear 
function in m vectors of the space L. Setting ail the vectors x \ , . . . , x m in L equal to 
x, then by Theorem 2.29, we obtain on the space L the form F(x) = F (x , . . . , x) of 
degree m . 

Let us dénoté by Fk(x) the sum of ail monomials of degree k > 0 appearing in 
the polynomial F(x) for a given choice of basis e\, ... ,e n . Thus Fk(x) is a form of 
degree k , and we obtain the expression 

F(x)= F 0 + F\ (x)4 h F m (x), (3.63) 

in which Fk (x) = 0 if there are no terms of degree k. For every form Fk (x) of degree 
k , the équation 

F k (Xx) = X k F k (x) (3.64) 

is satisfied for every scalar À g K and every vector x g L (clearly, it suffices to verify 
(3.64) for a monomial). Substituting in relation (3.63) the vector Xx in place of x, 
we obtain 


F(Xx) = Fo + XF[ (x) H h X m F m (x). (3.65) 

From this, it follows easily that the forms Fj in the représentation (3.63) are uniquely 
determined by the polynomial F. 

It is not difficult to see that the totality of ail polynomials on the space L form a 
vector space, which we shall dénoté by A. This notation is connected with the fact 
that the totality of ail polynomials forms not only a vector space, but a richer and 
more complex algebraic structure called an algebra. This means that in addition to 
the operations of a vector space, in A is also defined the operation of the product 
of every pair of éléments satisfying certain conditions; see the définition on p. 370. 
However, we shall not yet use this fact and will continue to view A solely as a vector 
space. 

Let us note that the space A is infinite-dimensional. Indeed, it suffices to consider 
the infinité sequence of forms Fk(x) = xf, where k runs through the natural num- 
bers, and the form Fk(x) assigns to a vector x with coordinates (xi, . . . , x n ) the &th 
power of its i th coordinate (the number i may be fixed). 
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The totality of forms of fixed degree k on a space L forms a subspace c A. 
Here Ao = K, and Ai coincides with the space L* of linear functions on L. The 
décomposition (3.63) could be interpreted as a décomposition of the space A as the 
direct sum of an infinité number of subspaces A^ (k — 0, 1, . . .) if we were to define 
such a notion. In the field of algebra, the accepted name for this is graded algebra. 

In the remainder of this section we shall look at two examples that use the con- 
cepts just introduced. Here we shall use the rules for differentiating functions of 
several variables (as applied to polynomials), which is something that might be new 
to some readers. However, reference to the formulas thus obtained will occur only 
at isolated places in the course, which can be omitted if desired. We présent these 
arguments only to emphasize the connection with other areas of mathematics. 

Let us begin with reasoning that uses a certain coordinate System, that is, a choice 
of some basis in the space L. For the polynomial F(x ,x n ), its partial dériva- 
tives are defined by dF/dxi , which are again polynomials. It is easy to see that the 
mapping that assigns to every polynomial F e A the polynomial 3 F /3x; détermines 
a linear transformation A -> A, which we dénoté by 3/3 x/. From these transforma- 
tions we obtain new transformations A —> A of the form 

n 3 

£ = (3.66) 

7=1 Xl 


where the P/ are arbitrary polynomials. Linear transformations of the form (3.66) 
are called first-order differential operators. In analysis and geometry one considers 
their analogues, whereby the Pj are functions of a much more general class and the 
space A is correspondingly enlarged. From the simplest properties of différentiation, 
it follows that the linear operators O defined by formula (3.66) exhibit the property 


0(PG ) = FO(G) + GO(F) (3.67) 


for ail P g A and G e A. 

Let us show that the converse also holds: an arbitrary linear transformation O : 
A A satisfying condition (3.67) is a first-order differential operator. To this end, 
we observe first that from the relation (3.67), it follows that <0(1) = 0. Indeed, 
setting in (3.67) the polynomial F — 1, we obtain the equality 0(1 G) — ÎO(G) + 
GO(l). Canceling the term O (G) on the left- and right-hand sides, we see that 
GO(l) = 0, and having selected as G an arbitrary nonzero polynomial (even if 
only G = 1), we obtain 0(1) = 0. 

Let us now détermine a linear transformation O' : A — > A according to the for- 
mula 

/ A 9 

0=0 — Pi — , where P, = O (x; ) . 

7 = 1 

It is easily seen that O'(l) = 0 and O '(x/) = 0 for ail indices i = 1, . . . , n. We ob- 
serve as well that the transformation O', like O, satisfies the relationship (3.67), 
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whence it follows that if 33 (F) = 0 and 33 (G) = 0, then also 33 (F G) = 0. There- 
fore, 33 r (F) — 0 if the polynomial F is the product of any two monomials from the 
collection 1, x \, . . . , x n . It is obvious that into the collection of such polynomials 
enter ail monomials of degree two, and consequently, for them we hâve 33' {F) = 0. 

Proceeding by induction, we can show that 33' (F) = 0 for ail monomials in 
for ail k , and therefore, this holds in general for ail forms Fk g A^. Finally, we recall 
that an arbitrary polynomial F g A is the sum of a finite number of homogeneous 
polynomials Fk G A k- Therefore, 33' (F) = 0 for ail F g A, which means that the 
transformation 33 has the form (3.66). 

The relationship (3.67) gives the définition of a first-order differential operator in 
a way that does not dépend on the coordinate System, that is, on the choice of basis 
e \ , . . . , e n of the space L. 


Example 3.85 Let us consider the differential operator 


n 




i=i 


d 

dxi 


It is clear that 33 (xj ) = Xi for ail i = 1, . . . , n, from which it follows that for the 

Z^Z 

restriction to the subspace A[ c A, the linear transformation 33 : Ai -> Ai becomes 
the identity, that is, equal to 8. We shall prove that for the restriction to the subspace 

z^z 

A k C A, the transformation 33 : Ak A k coincides with k8. We shall proceed by 

induction on k. We hâve already analyzed the case k — 1, and the case k — 0 is 
obvious. Consider now polynomials XjG, where G G Ak~\ and i — 1, . . . , n. Then 
from (3.67), we hâve the equality 33{xjG) = Xj33(G) + G33(xj). We hâve seen that 

/->-Z /->«Z 

33 (xj) = Xj , and by induction, we may assume that 33 (G) = (k — 1)G. As a resuit, 
we obtain the equality 

33 (xj G) = Xj (k — 1)G + Gxj = kxj G. 


But every polynomial F g A^ can be written as the sum of polynomials of the form 
XjGj with suitable G/ G Ak-\. Thus for an arbitrary polynomial F g A *, we obtain 
the relationship 33 (F) —kF. Written in coordinates, this takes the form 


n 



i = 1 


dF 




F e Ak, 


(3.68) 


and is called Euler ’s identity. 


Example 3.86 Let F(x) be an arbitrary polynomial on the vector space L. For a 
variable t g M and fixed vector x g L, the function F(tx), in view of relationships 
(3.63) and (3.64), is a polynomial in the variable t. The expression 


(doF)(x) = 


d 

dt 


F (tx) 


t = o 


(3.69) 


3.8 Forms and Polynomials in Vectors 


131 


is called the differential of the function F (x) at the point 0. Let us point out that on 
the right-hand side of equality (3.69) can be found the ordinary dérivative of F(tx) 
as a function of the variable t e M at the point t — 0. On the left-hand side of the 
equality (3.69) and in the expression “differential of the function at the point 0,” the 
symbol 0 signifies, as usual, the null vector of the space L. 

Let us now verify that (cIqF)(x) is a linear function in x. To this end, we use 
equality (3.65) for the polynomial F (tx). From the relationship 

F (tx ) = Fq + t F\ (x ) + • • • + t m F m (x ) , 
we obtain immediately that 


= F i(x), 
t = o 

where F\(x) is a linear function on L. Thus in the décomposition (3.63) for the 
polynomial F(x), for the second term, F\(x) = ( d$F)(x ), and therefore do F is 
frequently called the linear part of the polynomial F. 

We shall give an expression in coordinates for this important function. Using the 
rules of différentiation for a function of several variables, we obtain 



d 

dt 


F (tx) = 

i = 1 


dF d(tXi) 

(tx) 

dx[ dt 



Setting t — 0, we obtain from this formula 


n 


dF 


(dbF)(x) = ^^(0)x/. 
«=i 1 


(3.70) 


The coordinate représentation (3.70) for the differential is quite convenient, but it 
requires the sélection of a basis e \ , . . . , e n in the space L and the notation x = 

x\e \ H h x n e n . The expression (3.69) alone shows that ( doF)(x ) does not dépend 

on the choice of basis. In analysis, both expressions (3.69) and (3.70) are defined 
for functions of a much more general class than polynomials. 

We note that for polynomials F(x ,x n ) = x/, we obtain with the help of 
formula (3.70) the expression (doF)(x) = x/. This indicates that the functions 
(doxi ), . . . , (dox n ) form a basis of L* dual to the basis e \ , . . . , e n of L. 


Chapter 4 

Linear Transformations of a Yector Space 
to Itself 


4.1 Eigenvectors and Invariant Subspaces 

In the previous chapter we introduced the notion of a linear transformation of a 
vector space L into a vector space M. In this and the following chapters, we shall 
consider the important spécial case in which M coincides with L, which in this book 
will always be assumed to be finite-dimensional. Then a linear transformation A : 
L —> L will be called a linear transformation of the space L to itself, \ or simply a 
linear transformation of the space L. This case is of great importance, since it is 
encountered frequently in various fields of mathematics, mechanics, and physics. 
We now recall some previously introduced facts regarding this case. First of ail, 
as before, we shall understand the term number or scalar in the broadest possible 
sense, namely as a real or complex number or indeed as an element of any field K 
(of the reader’s choosing). 

As established in the preceding chapter, to represent a transformation A by a 
matrix, one has to choose a basis e i , . . . , e n of the space L and then to write the 
coordinates of the vectors A(e\), . . . , A(e n ) in terms of that basis as the columns 
of a matrix. The resuit will be a square matrix A of order n. If the transforma- 
tion A of the space L is nonsingular, then the vectors A(e i), . . . , A(e n ) themselves 
form a basis of the space L, and we may interpret A as a transition matrix from 
the basis e\ , . . . , e n to the basis A(e 1), . . . , A(e n ). A nonsingular transformation A 
obviously has an inverse, A -1 , with matrix A -1 . 

Example 4. 1 Let us write down the matrix of the linear transformation A that acts 
by rotating the plane in the counterclockwise direction about the origin through the 
angle a. To do so, we first choose a basis consisting of two mutually perpendicular 
vectors e\ and £2 °f unit length in the plane, where the vector ei is obtained from 
e\ by a counterclockwise rotation through a right angle (see Fig. 4 . 1 ). 

Then it is easy to see that we obtain the relationship 

A(e 1) = cosû'^i + sina^2» A(e 2) — — sinae\ + cosc^, 
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Fig. 4.1 Rotation through 
the angle a 

ei 



\ 




e. 


and it follows from the définition that the 
given basis is equal to 

/cos a 
à= . 
ysino' 

Example 4.2 Consider the linear transformation A of the complex plane that con- 
sists in multiplying each number z € C by a given fixed complex number p + iq 
(here i is the imaginary unit). 

If we consider the complex plane as a vector space L over the field C, then it is 
clear that in an arbitrary basis of the space L, such a transformation A has a matrix of 
order 1, consisting of a unique element, namely the given complex number p + iq. 
Thus in this case, we hâve dim L = 1 , and we need to choose in L a basis consisting 
of an arbitrary nonzero vector in L, that is, an arbitrary complex number z^O. Thus 
we obtain eA(z) = (/? + iq)z. 

Now let us consider the complex plane as a vector space L over the field R. In 
this case, dim L = 2, since every complex number z = x + iy is represented by a pair 
of real numbers * and y. Let us choose in L the same basis as in Example 4.1. Now 
we choose the vector e\ lying on the real axis, and the vector e 2 on the imaginary 
axis. From the équation 

(x + iy)(p + iq) = (px - qy) + i(py + qx) 


matrix of the transformation A in the 



(4.1) 


it follows that 


A(e 1 ) = pe 1 + qe 2 , A(e 2 ) = —qe\ + pe 2 , 


from which it follows by définition that the matrix of the transformation A in the 
given basis takes the form 



(4.2) 


In the case | p + i q \ = 1 , we may put p = cos a and q = sin a for a certain number 
0 < a < 2n (such an a is called the argument of the complex number p + iq)- Then 
the matrix (4.2) coincides with (4.1); that is, multiplication by a complex number 
with modulus 1 and argument a is équivalent to the counterclockwise rotation about 
the origin of the complex plane through the angle a. We note that every complex 
number p + iq can be expressed as the product of a real number r and a complex 
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number of modulus 1; that is, p + iq = r(p' + iq'), where \p' + iq'\ — 1 and r — 
\p + iq\. From this it is clear that multiplication by p + iq is the product of two 
linear transformations of the complex plane: a rotation through the angle a and a 
dilation (or contraction) by the factor r. 

In Sect. 3.4, we established that in the transition from a basis e\, ... ,e n of the 
space L to some other basis e\ , . . . , e' n , the matrix of the transformation is changed 
according to the formula 

A' = C~‘AC, (4.3) 

where C is the transition matrix from the second basis to the first. 

Définition 4.3 Two square matrices A and A' related by (4.3), where C is any 
nonsingular matrix, are said to be similar. 

It is not difficult to see that in the set of square matrices of a given order, the sim- 
ilarity relation thus defined is an équivalence relation (see the définition on p. xii). 

It follows from formula (4.3) that in changing bases, the déterminant of the trans- 
formation matrix does not change, and therefore it is possible to speak not simply 
about the déterminant of the transformation matrix, but about the déterminant of the 
linear transformation A itself, which will be denoted by \A\. A linear transforma- 
tion cA : L — > L is nonsingular if and only if | A\ ^ 0. If L is a real space, then this 
number |A>| 7^ 0 is also real and can be either positive or négative. 

Définition 4.4 A nonsingular linear transformation A : L —> L of the real space L is 
called proper if |<A| >0, and improper if \A\ <0. 

One of the basic tasks in the theory of linear transformations, one with which 
we shall be occupied in the sequel, is to find, given a linear transformation of a 
vector space into itself, a basis for which the matrix of the transformation takes the 
simplest possible form. An équivalent formulation of this task is for a given square 
matrix to find the simplest matrix that is similar to it. Having such a basis (or similar 
matrix) gives us the possibility of surveying a number of important properties of the 
initial linear transformation (or matrix). In its most general form, this problem will 
be solved in Chap. 5, but at présent, we shall examine it for a particular type of 
linear transformation that is most frequently encountered. 

Définition 4.5 A subspace L' of a vector space L is called invariant with respect to 
the linear transformation A : L -> L if for every vector x e L ; , we hâve A>(x) g L'. 

It is clear that according to this définition, the zéro subspace (0) and the entire 
space L are invariant with respect to any linear transformation A : L — ► L. Thus 
whenever we enumerate the invariant subspaces of a space L, we shall always mean 
the subspaces L' c L other than (0) and L. 

Example 4.6 Let L be the three-dimensional space studied in courses in analytic 
geometry consisting of vectors originating at a given fixed point O, and consider the 
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transformation A that reflects each vector with respect to a given plane L passing 
through the point O. It is then easy to see that A has two invariant subspaces: the 
plane L itself and the straight line L" passing through O and perpendicular to L. 

Example 4 . 7 Let L be the same space as in the previous example, and now let the 
transformation A be a rotation through the angle a, 0 < a < n , about a given axis 
L' passing through O. Then A has two invariant subspaces: the line L itself and the 
plane L " perpendicular to L' and passing through O . 

Example 4.8 Let L be the same as in the previous example, and let A be a homo- 
thety, that is, A acts by multiplying each vector by a fixed number a/0. Then it 
is easy to see that every line and every plane passing through O is an invariant sub- 
space with respect to the transformation A. Moreover, it is not difficult to observe 
that if A is a homothety on an arbitrary vector space L, then every subspace of L is 
invariant. 

Example 4.9 Let L be the plane consisting of ail vectors originating at some point 
O, and let A be the transformation that rotâtes a vector about O through the angle a , 
0 < a <7 ï . Then A has no invariant subspace. 

It is évident that the restriction of a linear transformation A to an invariant sub- 
space L C L is a linear transformation of L into itself. We shall dénoté this trans- 
formation by A f , that is, A' : L L and A' (x) = «A (je) for ail x e L. 

Let e\, ... ,e m be a basis of the subspace L. Then since it consists of linearly 
independent vectors, it is possible to extend it to a basis e\, ... ,e n of the entire 
space L. Let us examine how the matrix of the linear transformation A appears in 
this basis. The vectors A(e i), . . . , A(e m ) are expressed as a linear combination of 
e \ , . . . , e m ; this is équivalent to saying that e \ , . . . , e m is the basis of a subspace that 
is invariant with respect to the transformation A. We therefore obtain the System of 
équations 

A(e \ ) = a\ 1 e\ + «21 e 2 + b a m\ e m > 

A(e 2 ) — a\ 2 e\ + ^ 22^2 H b a m 2 e m , 

A(e m ) — a\ m €\ T a2m@2 “b • * * ~b dinm^m- 
It is clear that the matrix 

a\\ a \2 ••• a\ m 

a 2\ U22 ' * • Cl2m 

• • • 

• • • • 

^ml ttm2 ‘ * ‘ ttmm 

is the matrix of the linear transformation A' : L' — > L in the basis e\, . . . , e m . In 
general, we can say nothing about the vectors A{ei) for i > m except that they are 





4. 1 Eigenvectors and Invariant Subspaces 


137 


linear combinations of vectors from the basis e \, . . . , e n of the entire space L. How- 
ever, we shall represent this by separating out terms that are multiples of e \ , . . . , e m 
(we shall write the associated coefficients as bjj) and those that are multiples of the 
vectors e m +\, . . . , e n (here we shall write the associated coefficients as c/ 7 ). As a 
resuit we obtain the matrix 

A = £) , (4.5) 


where B' is a matrix of type (m,n — m), C' is a square matrix of order n — m, and 
0 is a matrix of type (; n — m, m) ail of whose éléments are equal to zéro. 

If it turns out to be possible to find an invariant subspace L" related to the invari- 
ant subspace L' by L = L' ® L ", then by joining the bases of L' and L", we obtain 
a basis for the space L in which the matrix of our linear transformation A can be 
written in the form 




5 


where A' is the matrix (4.4) and C' is the matrix of the linear transformation ob- 
tained by restricting the transformation A to the subspace L" . Analogously, if 


L — Li ® l_2 © • • • © L^, 


where ail the L / are invariant subspaces with respect to the transformation A, then 
the matrix of the transformation A can be written in the form 



(A\ 

0 

• o \ 

O • 

A' 

A 2 

... o 

• • O 

• • O 

■■■ A 'J 


(4.6) 


where A^ is the matrix of the linear transformation obtained by restricting A to the 
invariant subspace L, . Matrices of the form (4.6) are called block-diagonal. 

The simplest case is that of an invariant subspace of dimension 1 . This subspace 
has a basis consisting of a single vector e ^ 0 , and its invariance is expressed by the 
relationship 

A(e) = ke (4.7) 

for some number À. 


Définition 4.10 If the relationship (4.7) is satisfied for a vector e ^ 0 , then e is 
called an eigenvector, and the number À is called an eigenvalue of the transforma- 
tion A. 


Given an eigenvalue À, it is easy to verify that the set of ail vectors e e L satis- 
fying the relationship (4.7), including here also the zéro vector, forms an invariant 
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subspace of L. It is called the eigensubspace for the eigenvalue X and is denoted 
by U. 


Example 4.11 In Example 4.6, the eigenvectors of the transformation A are, first 
of ail, ail the vectors in the plane L' (in this case the eigenvalue is X = 1), and 
secondly, every vector on the line L" (the eigenvalue is X = — 1). In Example 4.7, 
the eigenvectors are ail vectors lying on the line L', and to them correspond the 
eigenvalue X = 1. In Example 4.8, every vector in the space is an eigenvector with 
eigenvalue X = a . Of course ail the vectors that we are speaking about are nonzero 
vectors. 

Example 4.12 Let L be the space consisting of ail infinitely différentiable functions, 
and let the transformation A be différentiation, that is, it maps every function x(t) in 
L to its dérivative x'(t). Then the eigenvectors of A are the functions *(t), not iden- 
tically zéro, that are solutions of the differential équation x'(t) = Xx(t). One easily 
vérifiés that such solutions are the functions x(t) — ce Xt , where c is an arbitrary 
constant. It follows that to every number À there corresponds a one-dimensional in- 
variant subspace of the transformation A consisting of ail vectors x(t) = ce kt , and 
for c ^ 0 these are eigenvectors. 

There is a convenient method for finding eigenvalues of a transformation A and 
the associated subspaces. We must first choose an arbitrary basis e \, . . . , e n of the 
space L and then search for vectors e that satisfy relation (4.7), in the form of the 
linear combination 


e — x\e\ -\-x2e2 H Vx n e n . (4.8) 

Let the matrix of the linear transformation A in the basis e \ , . . . , e n b eA=(û/j). 
Then the coordinates of the vector A(e) in the same basis can be expressed by the 
équations 


y 1 — « 11*1 + « 12*2 + • • * + a\ n x n , 
y 2 — < 221*1 + «22*2 H h «2>7*«, 


y 11 — «7/1*1 H - a n 2 X 2 T - * * * H - «/2/7*77* 


Now we can write down relation (4.7) in the form 


Q\\X\ + «12*2 + ' * ‘ + «1//*// = A*i, 
«21*1 H - «22*2 H h «2/7*7/ — XX 2 , 


«77 1*1 + «772*2 H - * ■ * H - «7777 *77 — Xx n , 
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or equivalently, 


(«il - X)x\ + <212*2 H h «177*77 = 0 , 

< 221*1 + («22 - ^)*2 H h < 22 77*77 = 0 , 

(4.9) 


«77 1*1 + «7z2*2 H h («7777 - A-)*77 = 0. 


For the coordinates x\, *2, . . . , *77 of the vector (4.8), we obtain a System of n ho- 
mogeneous linear équations. By Corollary 2.13, this System will hâve a nonzero 
solution if and only if the déterminant of its matrix is equal to zéro. We may write 
this condition in the form 


\A — XE\ = 0 . 

Using the formula for the expansion of the déterminant, we see that the déterminant 
\A — tE\ is a polynomial in î of degree n. It is called the characteristic polyno- 
mial of the transformation A. The eigenvalues of A are precisely the zéros of this 
polynomial. 

Let us prove that the characteristic polynomial is independent of the basis in 
which we write down the matrix of the transformation. It is only after we hâve ac- 
complished this that we shall hâve the right to speak of the characteristic polynomial 
of the transformation itself and not merely of its matrix in a particular basis. 

Indeed, as we hâve seen (formula (4.3)), in another basis we obtain the matrix 
A! — C _1 AC, where |C| 0. For this matrix, the characteristic polynomial is 


A! — tE 


C~ l AC -tE 


C~\A-tE)C\. 


Using the formula for the multiplication of déterminants and the formula for the 
déterminant of an inverse matrix, we obtain 


C~ l (A-tE)C | = | C 


-1 


A-tE I • ICI = | A-tE 


If a space has a basis e \ , . . . , e n consisting of eigenvectors, then in this basis, we 
hâve A(ei) — À/£/. From this, it follows that the matrix of a transformation A in 
this basis has the diagonal form 


Ai 

0 

• •• o\ 

• 0 

^2 

... 0 
• * 

• • 0 

• • 0 

• ■ 

• • • x n j 


This is a spécial case of (4.6) in which the invariant subspaces L z are one- 
dimensional, that is, L/ = (efi. Such linear transformations are called diagonaliz- 
able. 

As the following example shows, not ail transformations are diagonalizable. 
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Example 4.13 Let A be a linear transformation of the (real or complex) plane that 
in some basis e\ , has the matrix 



b± 0 . 


The characteristic polynomial \A — t E\ — {t — a) 2 of this transformation has a 
unique zéro t — a, of multiplicity 2 , to which corresponds the one-dimensional 
eigensubspace (e\). From this it follows that the transformation A is nondiago- 
nalizable. 


This can be proved by another method, using the concept of similar matrices. 
If the transformation A were diagonalizable, then there would exist a nonsingular 
matrix C of order 2 that would satisfy the relation C~ { AC — aE, or equivalently, 
the équation AC — aC . With respect to the unknown éléments of the matrix C = 
(cij), the previous equality gives us two équations, bc 2 \ — 0 and bc 22 = 0 , whence 
by virtue of b 0, it follows that C 2 \ — C 22 — 0. and the matrix C is thus seen to be 
singular. 

We hâve seen that the number of eigenvalues of a linear transformation is finite, 
and it cannot exceed the number n (the dimension of the space L), since they are the 
zéros of the characteristic polynomial, whose degree is n . 

Theorem 4.14 The dimension of the eigensubspace C L associated with the 
eigenvalue X is at most the multiplicity of the value X as a zéro of the character- 
istic polynomial. 


P roof Suppose the dimension of the eigensubspace Lx is m. Let us choose a basis 
e\ , . . . , e m of this subspace and extend it to a basis e\ , . . . , e n of the entire space 
L, in which the matrix of the transformation A has the form (4.5). Since by the 
définition of an eigensubspace, A (et) = Xe t for ail i = 1, . . . , m, it follows that in 
(4.5), the matrix A' is equal to XE m , where E m is the identity matrix of order m. 
Then 


A-tE 


(A' - tE m B' \ 
V 0 C'-tE n - m ) 


f(X-t)E m B ' \ 

V 0 C' — tE n - m ) ’ 


where E n - m is the identity matrix of order n — m. Therefore, 


\A-tE\ = (X-t) m C’ -t E n - 


m 


On the other hand, if L = ® L", then fl C — (0), which means that the re- 

striction of the transformation A to L" has no eigenvectors with eigenvalue X. This 
means that | C r — XE n - m | 7 ^ 0, that is, the number X is not a zéro of the polynomial 
| C' — t E n — m | , which is what we had to show. □ 


In the previous chapter we were introduced to the operations of addition and 
multiplication (composition) of linear transformations, which are clearly defined 
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for the spécial case of a transformation of a space L into itself. Therefore, for any 
integer n > 0 we may define the nth power of a linear transformation. By définition, 
A n for n > 0 is the resuit of multiplying A by itself n times, and for n = 0, A 0 is the 
identity transformation 8 . This enables us to introduce the concept of a polynomial 
in a linear transformation , which will play an important rôle in what follows. 

Let A be a linear transformation of the vector space L (real, complex, or over an 
arbitrary field K) and define 


f(x) = ao +a\x H 1 - a k x k , 

a polynomial with scalar coefficients (respectively real, complex, or from the 
field K). 

Définition 4.15 A polynomial f in the linear transformation A is a linear mapping 

f (<A>) — o(ç)8 H - ot\ <A> H - • • • + a k A k , (4. 10) 

where 8 is the identity linear transformation. 

We observe that this définition does not make use of coordinates, that is, the 
choice of a spécifie basis in the space L. If such a basis e \ , . . . , e n is chosen, then to 
the linear transformation A there corresponds a unique square matrix A. In Sect. 2.9 
we introduced the notion of a polynomial in a square matrix, which allows us to give 
another définition: f(A) is the linear transformation with matrix 

/(A) — ao E + a\ A-\ b otkA k (4.11) 


in the basis e \, . . . , e n . 

It is not difficult to be convinced of the équivalence of these définitions if we 
recall that the actions of linear transformations are expressed through the actions 
of their matrices (see Sect. 3.3). It is thus necessary to show that in a change of 
basis from e\,...,e n , the matrix /(A) also changes according to formula (4.3) 
with transition matrix C the same as for matrix A. Indeed, let us consider a change of 
coordinates (that is, switching to another basis of the space L) with matrix C. Then in 
the new basis, the matrix of the transformation A is given by A' — C~ { AC. By the 
associativity of matrix multiplication, we also obtain a relationship A" 1 = C~ [ A n C 
for every integer n > 0. If we substitute A! for A in formula (4.1 1), then considering 
what we hâve said, we obtain 


f (A 7 ) — (XqE + ct\ A! + • • • + Œk A! 

= C~ l (a 0 E + ot\A-\ b a k A k )C = C _1 /(A)C, 

which proves our assertion. 

It should be clear that the statements that we proved in Sect. 2.9 for polynomials 
in a matrix (p. 69) also apply to polynomials in a linear transformation. 
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Lemma 4.16 If f(x) + g(x) = u(x) and f(x)g(x ) = v(x ), thenfor an arbitrary 
linear transformation A, we hâve 

f{A) + g{A) = u{A ), (4.12) 

f(A)g(A) = v(A). (4.13) 

Corollary 4.17 Polynomials f(A) and g (A) in the same linear transformation A 
commute : f(A)g(A) = g(A)f(A). 


4.2 Complex and Real Vector Spaces 

We shall now investigate in greater detail the concepts introduced in the previous 
section applied to transformations of complex and real vector spaces (that is, we 
shall assume that the field K is respectively C or R). Our fundamental resuit applies 
specifically to complex spaces. 

Theorem 4.18 Every linear transformation of a complex vector space has an eigen- 
vector. 

This follows immediately from the fact that the characteristic polynomial of a 
linear transformation, and in general an arbitrary polynomial of positive degree, has 
a complex root. Nevertheless, as Example 4.13 of the previous section shows, even 
in a complex space, not every linear transformation is diagonalizable. 

Let us consider the question of diagonalizability in greater detail, always assum- 
ing that we are working with complex spaces. We shall prove the diagonalizability 
of a commonly occurring type of transformation. To this end, we require the follow- 
ing lemma. 

Lemma 4.19 Eigenvectors associated with distinct eigenvalues are linearly inde- 
pendent. 

P roof Suppose the eigenvectors e \, . . . , e m are associated with distinct eigenvalues 
À i , . . . , X m , 

A(ei) = Xiei, z' = l,...,m. 

We shall prove the lemma by induction on the number m of vectors. For the case 
m — 1, the resuit follows from the définition of an eigenvector, namely that e\ / 0. 
Let us assume that there exists a linear dependence 

ot\e i + 0 L 2 e 2 H h ot m e m = 0. (4.14) 

Applying the transformation A to both sides of the équation, we obtain 


X\aiei + X20t2^2 + • • • H - X m a m e m — 0. 


(4.15) 
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Subtracting (4.14) multiplied by k m from (4.15), we obtain 

(X\ C^-l h m )& 1 H - ^2(^2 h m )€ 2 H h (Xm— 1 ifm — 1 h m )&m— 1 — 0. 

By our induction hypothesis, we may consider that the lemma has been proved 
for m — 1 vectors e \, . . . , e m -\. Thus we obtain that a\(k\ — k m ) = 0, . .., 
û' m _i(À m _i — À m ) = 0, and since by the condition in the lemma, k\ ^ k m , . . . , 
À m _i ^ k m , it follows that a\ = • • • = a m _i = 0. Substituting this into (4.14), we 
arrive at the relationship a m e m = 0, that is (by the définition of an eigenvector), 
a m = 0. Therefore, in (4.14), ail the a/ are equal to zéro, which demonstrates the 
linear independence of e i , . . . , e m . □ 

By Lemma 4.19, we hâve the following resuit. 

Theorem 4.20 A linear transformation on a complex vector space is diagonalizable 
ifits characteristic polynomial has no multiple roots. 

As is well known, in this case, the characteristic polynomial has n distinct roots 
(we recall once again that we are speaking about polynomials over the field of com- 
plex numbers). 

Proofof Theorem 4.20 Let k [ , . . . , k n be the distinct roots of the characteristic poly- 
nomial of the transformation A and let e \ , . . . , e n be the corresponding eigenvec- 
tors. It suffices to show that these vectors form a basis of the entire space. Since 
their number is equal to the dimension of the space, this is équivalent to showing 
their linear independence, which follows from Lemma 4.19. □ 

If A is the matrix of the transformation A in some basis, then the condition of 
Theorem 4.20 is satisfied if and only if the so-called discriminant of the character- 
istic polynomial is nonzero. 1 For example, if the order of a matrix A is 2, and 



then 


| A-tE 


a — t 


— (a — t)(d — t) — bc = t 2 — (a + d)t + ad — bc. 


c 


The condition that this quadratic trinomial hâve two distinct roots is that {a + d) 2 — 
4(ad — bc) 0. This can be rewritten in the form 


(a - d) 2 + Abc ± 0. 


(4.16) 


! For the general notion of the discriminant of a polynomial, see, for instance, Polynomials, by 
Victor V. Prasolov, Springer 2004. 
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Similarly, for complex vector spaces of arbitrary dimension, linear transforma- 
tions not satisfying the conditions of Theorem 4.20 hâve a matrix that regardless 
of the basis, has éléments that satisfy a spécial algebraic relationship. In this sense, 
only exceptional transformations do not meet the conditions of Theorem 4.20. 

Analogous considérations give necessary and sufficient conditions for a linear 
transformation to be diagonalizable. 

Theorem 4.21 A linear transformation of a complex vector space is diagonaliz- 
able if and only if for each ofits eigenvalues À, the dimension ofthe corresponding 
eigenspace is equal to the multiplicity ofX as a root ofthe char acte ristic polyno- 
mial. 

In other words, the bound on the dimension of the subspace obtained in The- 
orem 4.14 is attained. 


P roof of Theorem 4.21 Let the transformation A be diagonalizable, that is, in some 
basis e \ , . . . , e n it has the matrix 



0 

Xl 


0 \ 

0 


\0 0 • • • X n ) 


It is possible to arrange the eigenvalues X\, ... ,X n so that those that are equal are 
next to each other, so that altogether, they hâve the form 


Ài , . . . , Àj , À2, . . . , A.2, 






m \ times /7?2 times 


, Xk , . . . , Xk , 

~v 

mji times 


where ail the numbers X\,...,Xk are distinct. In other words, we can write the 
matrix A in the block-diagonal form 



(4.17) 


where E mj is the identity matrix of order m, . Then 


A-tE | = (M - 0 mi (*2 - t) mi • • • (X k - t) m \ 


that is, the number À/ is a root of multiplicity m/ of the characteristic équation. 
On the other hand, the equality cA(x) = XjX for vectors x — a\e\ + • • • + a n e n 
gives the relationship X s ocj — À/a / for ail j — 1, . . . , n and s — 1, . . . , k, that is, 
either a j = 0 or X s = À/. In other words, the vector x is a linear combination only 
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of those eigenvectors e j that correspond to the eigenvalue À, . This means that the 
subspace L^. consists of ail linear combinations of such vectors, and consequently, 
dim = ra; . 

Conversely, for distinct eigenvalues X\, . . . , Xk, let the dimension of the eigen- 
subspace be equal to the multiplicity ra; of the number X\ as a root of the char- 
acteristic polynomial. Then from known properties of polynomials, it follows that 
m i H + mk — n, which means that 


dim Lx { H + dim Lx k = dim L. (4. 1 8) 

We shall show that the sum + • • • + is a direct sum of its eigensubspaces 
L^. . To do so, it suffices to show that for ail vectors x\ e L^, . . . , x * G L^, the 
equality x\ + • • • + Xk = 0 is possible only in the case that x\ = • • • = x& = 0. But 
since x\, . . . ,Xk are eigenvectors of the transformation A corresponding to distinct 
eigenvalues X \, . . . , Xk, the required assertion follows by Lemma 4.19. Therefore, 
by equality (4.18), we hâve the décomposition 

L — Lx, ® • • • ® \-x k - 

Having chosen from each eigensubspace Lx, : , i = 1, . . . , k, a basis (consisting of m* 
vectors), and having ordered them in such a way that the vectors entering into a 
particular subspace L^. are adjacent, we obtain a basis of the space L in which the 
matrix A of the transformation A has the form (4.17). This means that the transfor- 
mation A is diagonalizable. □ 

The case of real vector spaces is more frequently encountered in applications. 
Their study proceeds in almost the same way as with complex vector spaces, except 
that the results are somewhat more complicated. We shall introduce here a proof of 
the real analogue of Theorem 4.18. 

Theorem 4.22 Every linear transformation of a real vector space of dimension 
n > 2 has either a one-dimensional or two-dimensional invariant subspace. 

Proof Let 4 be a linear transformation of a real vector space L of dimension 
n > 2, and let x g L be some nonnull vector. Since the collection x, «A(x), <>4> 2 (x), 

. . . , A u (x) consists of n + 1 > dim L vectors, then by the définition of the dimension 
of a vector space, these vectors must be linearly dépendent. This means that there 
exist real numbers ao, ot\ , . . . , a n , not ail zéro, such that 

ao* + a\FF(x) + ct 2 A 2 {x) H h a n A n (x) = 0. (4.19) 

Consider the polynomial P (t ) = ao + oq t H h ct n t n and substitute for the variable 

t, the transformation A, as was done in Sect. 4.1 (formula (4.10)). Then the equality 
(4.19) can be written in the form 


P( d 4,)(x) = 0. 


(4.20) 
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A polynomial P(t) satisfying equality (4.20) is called an annihilator polynomial of 
the vector x (where it is implied that it is relative to the given transformation A). 

Let us assume that the annihilator polynomial P{t) of some vector x 0 is the 
product of two polynomials of lower degree: P(t) = Q \ (t) Q 2 (t) . Then by définition 
(4.20) and formula (4.13) from theprevious section, we hâve Q\(A>)Q 2 (A)(x) — 0. 
Then either Q 2 (A){x) — 0, and hence the vector x is annihilated by an anni- 
hilator polynomial Q 2 (t) of lower degree, or else Q 2 (A)(x) 7 ^ 0. If we assume 
y — Q 2 (A)(x), we obtain the equality Qi(A)(y) — 0, which means that the non- 
null vector y is annihilated by the annihilator polynomial Q\(t) of lower degree. As 
is well known, an arbitrary polynomial with real coefficients is a product of polyno- 
mials of first and second degree. Applying to P(t) as many times as necessary the 
process described above, we finally arrive at a polynomial < 2(0 of first or second 
degree and a nonnull vector z such that Q(A)(z) = 0. This is the real analogue of 
Theorem 4.18. 

Factoring out the coefficient of the high-order term of Q(t), we may assume that 
this coefficient is equal to 1. If the degree of Q(t) is equal to 1, then Q(t) = t — X for 
some À, and the equality Q(A)(z) — 0 yields («A — X8)(z) — 0. This means that X is 
an eigenvalue of z, which is an eigenvector of the transformation A, and therefore, 
(z) is a one-dimensional invariant subspace of the transformation A. 

If the degree of <2(0 is equal to 2, then 2(0 = t 2 + pt + q and (A 2 + pA + 
q8)(z) = 0 . In this case, the subspace 12 = (z, *A(z)) is two-dimensional and is in- 
variant with respect to A. Indeed, the vectors z and A>(z) are linearly independent, 
since otherwise, we would hâve the case of an eigenvector z considered above. This 
means that dimL' = 2. We shall show that L is an invariant subspace of the trans- 
formation A. Let x = az + /3A(z). To show that A)(x) G L', it suffices to verify that 
vectors <A(z) and ^(^(z)) belong to L. This holds for the former by the définition 
of L. It holds for the latter by the fact that ^(«^(z)) = A 2 (z) and by the condition 
of the theorem, A 2 {z) 4 - pA(z) + qz = 0 , that is, A 2 {z) — —qz — pA(z ). □ 

Let us discuss the concept of the annihilator polynomial that we encountered in 
the proof of Theorem 4.22. An annihilator polynomial of a vector x 7 ^ 0 having 
minimal degree is called a minimal polynomial of the vector x. 

Theorem 4.23 Every annihilator polynomial is divisible by a minimal polynomial. 

Proof Let P(t) be an annihilator polynomial of the vector x^0, and Q(t) a mini- 
mal polynomial. Let us suppose that P is not divisible by Q. We divide P by Q with 
remainder. This gives the equality P = U Q + R, where U and R are polynomials 
in t , and moreover, R is not identically zéro, and the degree of R is less than that 
of Q. If we substitute into this equality the transformation A for the variable t , then 
by formulas (4.12) and (4.13), we obtain that 


P(A)(x) = U(A)Q(A)(x) + R(A)(x), 


(4.21) 
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and since P and Q are annihilator polynomials of the vector x, it follows that 
R(A)(x) = 0 . Since the degree of R is less than that of Q , this contradicts the 
minimality of the polynomial Q. □ 

Corollary 4.24 The minimal polynomial ofa vector is uniquely definecl up to 
a constan t factor. 

Let us note that for the annihilator polynomial, Theorem 4.23 and its converse 
hold: any multiple of any annihilator polynomial is also an annihilator polynomial 
(of course, of the same vector x). This follows from the fact that in this case, in 
equality (4.21), we hâve R — 0. From this follows the assertion that there exists a 
single polynomial that is an annihilator for ail vectors of the space L. Indeed, let 
e \, . . . , e n be some basis of the space L, and let P\, ... , P n be annihilator polyno- 
mials for these vectors. Let us dénoté by Q the least common multiple of these 
polynomials. Then from what we hâve said above, it follows that Q is an annihi- 
lator polynomial for each of the vectors e\, ... ,e n ; that is, Q(A)(ei) = 0 for ail 
i = 1 We shall prove that Q is an annihilator polynomial for every vec- 
tor x g L. By définition, x is a linear combination of vectors of a basis, that is, 
x = oc\e\ + af 2^2 + b ot n e n . Then 


ÔMO(x) = Q(A)(a l e l H +a n e n ) 

= a i Q(A)(e i) H \-a n Q(A)(e n ) 

= 0 . 

Définition 4.25 A polynomial the annihilâtes every vector of a space L is called an 
annihilator polynomial of this space (keeping in mind that we mean for the given 
linear transformation A : L —> L). 

In conclusion, let us compare the arguments used in the proofs of Theorems 4.18 
and 4.22. In the first case, we relied on the existence of a root (that is, a factor of 
degree 1) of the characteristic polynomial, while in the latter case, we required the 
existence of a simplest factor (of degree 1 or 2) for the annihilator polynomial. The 
connection between these polynomials relies on a resuit that is important in and of 
itself. It is called the Cayley-Hamilton theorem. 

Theorem 4.26 The characteristic polynomial is an annihilator polynomial for its 
associated vector space. 

The proof of this theorem is based on arguments analogous to those used in the 
proof of Lemma 4.19, but relating to a much more general situation. We shall now 
consider polynomials in the variable t whose coefficients are not numbers, but linear 
transformations of the vector space L into itself or (which is the same thing if some 
fixed basis has been chosen in L) square matrices P; : 


148 


4 Linear Transformations of a Vector Space to Itself 


P(0 — Pq H- P\t + • • • + P^O . 

One can work with these as with ordinary polynomials if one assumes that the vari- 
able t commutes with the coefficients. It is also possible to substitute for t the matrix 
A of a linear transformation. We shall dénoté the resuit of this substitution by P (A), 
that is, 

P(A) = P 0 +P l A + --- + P k A k . 

It is important here that t and A are written to the right of the coefficients Pj . Further, 
we shall consider the situation in which Pj and A are square matrices of one and the 
same order. In view of what we hâve said above, ail assertions will be true as well 
for the case that in the last formula, instead of the matrices P z and A we hâve the 
linear transformations Pj and A of some vector space L into itself: 

P {A) = Po + P\A-\ h P k A k . 

However, in this case, the analogue of formula ( 4 . 13 ) from Sect. 4.1 does not 
hold, that is, if the polynomial R(t) is equal to P (0(2(0 an d A is the matrix of 
an arbitrary linear transformation of the vector space L. Then generally speaking, 
P (A) ^ P (A) Q (A). For example, if we hâve polynomials P = P\î and Q — Q o, 
then Pi t Qo = P\ Qot, but it is not true that Pi A Qo = P\ QoA for an arbitrary matrix 
A, since matrices A and Qo do not necessarily commute. However, there is one 
important spécial case in which formula ( 4 . 13 ) holds. 

Lemma 4.27 Let 

P(0 = Po + P\t + • • • + Pk^ 9 (2(0 = Qo 4" Q\t + • • • + Qit 1 , 

and suppose that the polynomial R(t ) equals P (0(2(0- Then P (A) = P(A)Q(A) 
if the matrix A commutes with every coefficient of the polynomial Q(t), that is , 
A Qj = Qj A for ail i — 1 ,...,/. 


Proof It is not difficult to see that the polynomial R(t) — P (0(2(0 can be rep- 
resented in the form P(0 = Po + R\t + • • • + P&+/0 +/ with coefficients R s — 
E‘/=o p i Qs-i . where Pi = 0 if i > k , and Qj = 0 if i > /. Similarly, the polyno- 
mial P (A) = P (A) Q (A) can be expressed in the form 

k+l / 5 
5 = 0 \i=0 

with the same conditions: Pj =0 if i > k, and Qj — 0 if i > L B y the condition of 
the lemma, A Q 7 = Q ; A, whence by induction, we easily obtain that A 1 Qj = Q jA 1 
for every choice of i and j . Thus our expression takes the form 



k+i / s \ 

*( A) = E = p ( A) ô( A )- 

s=0 \/=0 / 


□ 
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Of course, the analogous assertion holds for ail polynomials for which the vari- 
able t stands to the left of the coefficients (then the matrix A must commute with 
every coefficient of the polynomial P, and not Q). 

Using Lemma 4.27, we can prove the Cayley-Hamilton theorem. 

ProofofTheorem 4.26 Let us consider the matrix tE — A and dénoté its déterminant 
by <p(t) — \tE — A\. The coefficients of the polynomial <p(î) are numbers, and as is 
easily seen, it is equal to the characteristic polynomial matrix A multiplied by (— l) ;î 
(in order to make the coefficient of t n equal to 1). Let us dénoté by B(t) the adjugate 
matrix to tE — A (see the définition on p. 73). It is clear that B(t) will contain as 
its éléments certain polynomials in t of degree at most n — 1 , and consequently, we 

may write it in the form B(t) — Bq + B\ t H b B n _ i t n ~ 1 , where the P, are certain 

matrices. Formula (2.70) for the adjugate matrix yields 

B(t)(tE -A) = cp(t)E. (4.22) 

Let us substitute into formula (4.22) in place of the variable t the matrix A of the 
linear transformation A with respect to some basis of the vector space L. Since the 
matrix A commutes with the identity matrix E and with itself, then by Lemma 4.27, 
we obtain the matrix equality B(A)(AE — A) = cp(A)E, the left-hand side of which 
is equal to the null matrix. It is clear that in an arbitrary basis, the null matrix is the 
matrix of the null transformation (9 : L L, and consequently, <p(A) = O. And this 
is the assertion of Theorem 4.26. □ 

In particular, it is now clear that by the proof of Theorem 4.22, we may take as 
the annihilator polynomial the characteristic polynomial of the transformation A. 


4.3 Complexification 

In view of the fact that real vector spaces are encountered especially frequently in 
applications, we présent here another method of determining the properties of linear 
transformations of such spaces, proceeding from already proved properties of linear 
transformations of complex spaces. 

Let L be a finite-dimensional real vector space. In order to apply our previously 
worked-out arguments, it will be necessary to embed it in some complex space L c . 
For this, we shall use the fact that, as we saw in Sect. 3.5, L is isomorphic to the 
space of rows of length n (where n — dim L), which we dénoté by R” . 

In view of the usual set inclusion M c C, we may consider R ,? a subset of C” . In 
this case, it is not, of course, a subspace of C ,z as a vector space over the field C. 
For example, multiplication by the complex scalar i does not take R" into itself. On 
the contrary, as is easily seen, we hâve the décomposition 


C" = M" © iR n 
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(let us recall that in C” , multiplication by i is defined for ail vectors, and in particular 
for vectors in the subset R 77 ). We shall now dénoté R 77 by L, while C 77 will be denoted 
by L c . The previous relationship is now written thus: 

L C = L©/L. (4.23) 

An arbitrary linear transformation on a vector space L (as a space over the field 
R) can then be extended to ail of L c (as a space over the field C). Namely, as follows 
from the décomposition (4.23), every vector reL c can be uniquely represented in 
the form x = u + iv, where u, v g L, and we set 

A C (x) = A(u) + iA(v). (4.24) 

We omit the obvious vérification that the mapping eA c defined by the relationship 
(4.24) is a linear transformation of the space L c (over the field C). Moreover, it is 
not difficult to prove that is the only linear transformation of the space L c whose 
restriction to L coincides with A, that is, for which the equality A iC (x) = ^(jc) is 
satisfied for ail x in L. 

The construction presented here may seem somewhat inélégant, since it uses 
an isomorphism of the spaces L and R 77 , for whose construction it is necessary to 
choose some basis of L. Although in the majority of applications such a basis exists, 
we shall give a construction that does not dépend on the choice of basis. For this, 
we recall that the space L can be reconstructed from its dual space L* via the iso- 
morphism L ~ L**, which we constructed in Sect. 3.7. In other words, L ~ £(L*, R), 
where as before, £(L, M) dénotés the space of linear mappings L — ► M (here either 
ail spaces are considered complex or else they are ail considered real). 

We now consider C as a two-dimensional vector space over the field R and set 

L c = £(L*,C), (4.25) 

where in £(L*,C), both spaces L* and C are considered real. Thus the relation- 
ship (4.25) carries L c into a vector space over the field R. But we can convert 
it into a space over the field C after defining multiplication of vectors in L c by 
complex scalars. Namely, if <p e £(!_*, C) and z g C, then we set z<p — where 
i/r g £(L*, C) is defined by the condition 

fif) = £ * (Pif) ^ ail / e L*. 

It is easily verified that L c thus defined is a vector space over the field C, and passage 
from L to L (C will be the same as described above, for an arbitrary choice of basis L 
(that is, choice of the isomorphism L ~ R 77 ). 

If .A is a linear transformation of the space L, then we shall define a corresponding 
linear transformation A c of the space L c , after assigning to each vector i/r e L c the 
value e A c (t^) G L c using the relation 

(<A C (f ))(/) = for ail / e L*, 
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where A* : L* —> L* is the dual transformation to A (see p. 125). It is clear that 
A c is indeed a linear transformation of the space L c , and its restriction to L coin- 
cides with the transformation A, that is, for every \/r e L, eA c (^)(/) = A(pjr){f) is 
satisfied for ail / g L* . 

Définition 4.28 The complex vector space L c is called the complexification of the 
real vector space L, while the transformation A c : L c — ► L c is the complexification 
of the transformation A : L -> L. 

Remark 4.29 The construction presented above is applicable as well to a more gen- 
eral situation: using it, it is possible to assign to any vector space L over an arbitrary 
field K the space L K over the bigger field K' D K, and to the linear transformation 
A of the field L, the linear transformation A K of the field L K . 

In the space L c that we constructed, it will be useful to introduce the operation of 
complex conjugation, which assigns to a vector x g L c the vector x g L c , or inter- 
preting L c as C" (with which we began this section), taking the complex conjugate 
for each number in the row x, or (equivalently) using (4.23), setting x — u — i v for 
x = u + i v . It is clear that 


x + y=x + y, (ax) = ax 

hold for ail vectors x , y g L c and arbitrary complex scalar a . 

The transformation A c obtained according to the rule (4.24) from a certain trans- 
formation <>4 of a real vector space L will be called real. For a real transformation 
cA c , we hâve the relationship 


A c (x) = A c (x), (4.26) 

which follows from the définition (4.24) of a transformation A c . Indeed, if we hâve 
x = u + i v, then 

f A c (x) = A(u) + iA(v), A c (x) — A(u) — iA(v). 

On the other hand, x = u — iv, from which follows eA c (x) = A{u) — iA( v) and 
therefore (4.26). 

Consider the linear transformation A of the real vector space L. To it there corre- 
sponds, as shown above, the linear transformation A c of the complex vector space 
L c . By Theorem 4.18, the transformation A c has an eigenvector x e L c for which, 
therefore, one has the equality 


<A c (jt) = Àx, (4.27) 

where À is a root of the characteristic polynomial of the transformation A and, 
generally speaking, is a certain complex number. We must distinguish two cases: À 
real and À complex. 
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Case 1 : À is a real number. In this case, the characteristic polynomial of the trans- 
formation A has a real root, and therefore A has an eigenvector in the field L; that 
is, L has a one-dimensional invariant subspace. 

Case 2: À is a complex number. Let \ = a + ib, where a and b are real numbers, 
b 0. The eigenvector x can also be written in the form x = u + iv, where the 
vectors w, v are in L. By assumption, -A^Cr) = A(u) + iA(v), and then relationship 
(4.27), in view of the décomposition (4.23), gives 

A(v)=av + bu, A(u) — —bv + au. (4.28) 

This means that the subspace L/ = (v, ïi) of the space L is invariant with respect to 
the transformation A. The dimension of the subspace L' is equal to 2, and vectors 
v, u form a basis of it. Indeed, it suffices to verify their linear independence. The lin- 
ear dependence of v and u would imply that v — (or else that u — §v) for some 
real £. But by v — Çw, the second equality of (4.28) would yield the relationship 
A(u) = (a — b%)u , and that would imply that u is a real eigenvector of the transfor- 
mation A, with the real eigenvalue a — bÇ; that is, we are dealing with case 1. The 
case u = £ v is similar. 


Uniting cases 1 and 2, we obtain another proof of Theorem 4.22. We observe 
that in fact, we hâve now proved even more than what is asserted in that theorem. 
Namely, we hâve shown that in the two-dimensional invariant subspace L' there 
exists a basis v, u in which the transformation A gives the formula (4.28), that is, it 
has a matrix of the form 



b± 0 . 


Définition 4.30 A linear transformation A of a real vector space L is said to be 
block-diagonalizable if in some basis, its matrix has the form 


/a\ 0 

0 ’• 




a r 0 
0 Bi 


\ 


0 



0 

Bs) 


(4.29) 


where a \ , . . . , a r are real matrices of order 1 (that is, real numbers), and B \ , . . . , B s 
are real matrices of order 2 of the form 



b j ± 0 . 


(4.30) 
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Block-diagonalizable linear transformations are the real analogue of diagonaliz- 
able transformations of complex vector spaces. The connection between these two 
concepts is established in the following theorem. 

Theorem 4.31 A linear transformation A of a vector space L is block- 

(T* 

diagonalizable if and only if its complexification is a diagonalizable trans- 
formation of the space L c . 


Proof Suppose the linear transformation A : L -> L is block-diagonalizable. This 
means that in some basis of the space L, its matrix has the form (4.29), which is 
équivalent to the décomposition 

L = l_i © • • • © L/- ® Mi © • • • © Ms, (4.31) 


where L / and M ; are subspaces that are invariant with respect to the transforma- 
tion A. In our case, dim L, = 1, so that L / = (ej) and A (et) = e\ , and dim M y = 2, 
where in some basis of the subspace M ; , the restriction of the transformation A to 
M j has matrix of the form (4.30). Using formula (4.30), one is easily convinced that 
the restriction A c to the two-dimensional subspace M j has two distinct complex- 
conjugate eigenvalues: kj and À/. If f j and f'- are the corresponding eigenvectors, 

then in L c there is a basis e i , . . . , e r , / 1 , f \ , . . . , f s , f s , in which the matrix of the 
transformation A c assumes the form 


/a\ 

0 



• 0 

°\ 

0 

• 

" • ’ • ’ • 

• 

0 

• 

• 

a r 0 


• 

• 


0 ki 


• 

• 

• 

ki 

• 

• 

0 

• 


• X s 

0 

\o 

0 

• •• ••• ••• • 

• 0 

kJ 


(4.32) 


This means that the transformation A c is diagonalizable. 

Now suppose, conversely, that A> c is diagonalizable, that is, in some basis of the 
space L c , the transformation A c has the diagonal matrix 


Ai 

0 

••• 0\ 

• o 

2-2 

... o 

• • O 

• • O 

* * ' j 


(4.33) 


Among the numbers X \ , . . . , X n may be found some that are real and some that are 
complex. Ail the numbers kj are roots of the characteristic polynomial of the trans- 
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formation A c . But clearly (by the définition of L c ), any basis of the real vector 
space L is a basis of the complex space L c , and in such a basis, the matrices of the 
transformations A and A c coincide. That is, the matrix of the transformation A c 
is real in some basis. This means that its characteristic polynomial has real coeffi- 
cients. It then follows from well-known properties of real polynomials that if among 
the numbers Ài , . . . , X n some are complex, then they corne in conjugate pairs Xj and 
Xj, and moreover, Xj and Xj occur the same number of times. We may assume that 
in the matrix of (4.33), the first r numbers are real: À/ — a/ g R (i < r), while the re- 
mainder are complex, and moreover, Xj and Xj (j > r) are adjacent to each other. In 
this case, the matrix of the transformation assumes the form (4.32). Along with each 
eigenvector e of the transformation the space L c contains a vector ë. Moreover, 
if e has the eigenvalue À, then ë has the eigenvalue X. This follows easily from the 
fact that A is a real transformation and from the relationship (L c )x = (L c )y, which 
can be easily verified. Therefore, we may write down the basis in which the trans- 
formation has the form (4.32) in the form e\ , . . . , e r , / [ , /i , . . . , f s , f s , where 
ail e- t are in L. 

Let us set f j — Uj + ivj, where U j , Vj g L, and let us consider the subspace 
N j = ( Uj , Vj). It is clear that N/ is invariant with respect to A, and by formula 
(4.28), the restriction of A to the subspace N j gives a transformation that in the 
basis Uj , v j has matrix of the form (4.30). We therefore see that 

L c = (<?i) ® • • • © ( e r ) 0 i{e\) 0 • • • 0 i{e r ) © N, © /Ni © • • • © N, © iN s , 
from which follows the décomposition 

L = (#1 ) ® • • • ® (&r) ffi Ni ® • • • ® N iÇ , 

analogous to (4.31). This shows that the transformation A : L — >• L is block- 
diagonalizable. □ 

Similarly, using the notion of complexification, it is possible to prove a real ana- 
logue of Theorems 4.14, 4.18, and 4.21. 


4.4 Orientation of a Real Vector Space 


The real line has two directions: to the left and to the right (from an arbitrarily cho- 
sen point, taken as the origin). Analogously, in real three-dimensional space, there 
are two directions for traveling around a point: clockwise and counterclockwise. We 
shall consider analogous concepts in an arbitrary real vector space (of finite dimen- 
sion). 

Let e \ , . . . , e n and e\ , . . . , e' n be two bases of a real vector space L. Then there 
exists a linear transformation A : L — ► L such that 


A (et) 


= e. 


i 1 , . . . , 


n. 


(4.34) 
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It is clear that for the given pair of bases, there exists only one such linear transfor- 
mation A, and moreover, it is not singular: (|<A| 7 ^ 0). 

Définition 4.32 Two bases e\, ... ,e n and ... , e' n are said to hâve the same ori- 
entation if the transformation A satisfying the condition (4.34) is proper (|eA| > 0; 
recall Définition 4.4), and to be oppositely oriented if A is improper (|e>4>| < 0). 

Theorem 4.33 The property ofhaving the same orientation induces an équivalence 
relation on the set of ail bases of the vector space L. 

Proof The définition of équivalence relation (on an arbitrary set) was given on 
page xii, and to prove the theorem, we hâve only to verify symmetry and transitivity, 
since reflexivity is completely obvious (for the mapping A, take the identity trans- 
formation 8). Since the transformation A is nonsingular, it follows that relationship 
(4.34) can be written in the form A~ l (e'f) = et, i — 1 , ... ,n, from which follows 
the symmetry property of bases having the same orientation: the transformation A 
is replaced by A~ { , where here |e>4> _1 | = |*>4>| -1 , and the sign of the déterminant 
remains the same. 

Let bases e\, ... ,e n and e \ , . . . , e' n hâve the same orientation, and suppose bases 
e \ , . . . , e' n and e '[, . . . , e" n also hâve the same orientation. B y définition, this means 
that the transformations A, from (4.34), and 33, defined by 

3B(e' i )=e' i ', 1 = 1 ai, (4.35) 

are proper. Replacing in (4.35) the expressions for the vectors e- from (4.34), we 
obtain 

33A(ei) = e", i = 1 , . . . , n, 

and since \33A\ = \33\ • \A\, the transformation 33 A is also proper, that is, the bases 
e \ , . . . , e n and e'[, ... , hâve the same orientation, which complétés the proof of 
transitivity. □ 

We shall dénoté the set of ail bases of the space L by (£. Theorem 4.33 then 
tells us that the property of having the same orientation décomposés the set (8 into 
two équivalence classes, that is, we hâve the décomposition (8 = (£1 U (£ 2 , where 
(J j n (82 — 0. To obtain this décomposition in practice, we may proceed as follows: 
Choose in L an arbitrary basis e \ , . . . , e n and dénoté by (£1 the collection of ail bases 
that hâve the same orientation as the chosen basis, and let (82 dénoté the collection 
of bases with the opposite orientation. Theorem 4.33 tells us that this décomposi- 
tion of (8 does not dépend on which basis e \ , . . . , e n we choose. We can assert that 
any two bases appearing together in one of the two subsets (£1 and 8 2 hâve the 
same orientation, and if they belong to different subsets, then they hâve opposite 
orientations. 

Définition 4.34 The choice of one of the subsets (£1 and (£2 is called an orientation 
of the vector space L. Once an orientation has been chosen, the bases lying in the 
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chosen subset are said to be positively oriented , while those in the other subset are 
called negatively oriented. 

As can be seen from this définition, the sélection of an orientation of a vector 
space dépends on an arbitrary choice: it would hâve been equally possible to hâve 
called the positively oriented bases negatively oriented, and vice versa. It is no ac- 
cident that in practical applications, the actual choice of orientation is frequently 
based on an appeal such as to the structure of the human body (left-right) or to the 
motion of the Sun in the heavens (clockwise or counterclockwise). 

The crucial part of the theory presented in this section is that there is a connection 
between orientation and certain topological concepts (such as those presented in the 
introduction to this book; see p. xvii). 

To pursue this idea, we must first of ail define convergence for sequences of 
éléments of the set (£. We shall do so by introducing on the set (£ a me trie, that 
is, by converting it into a metric space. This means that we must define a function 
r(x, y) for ail x, y e (£ taking real values and satisfying properties 1-3 introduced 
on p. xvii. We begin by defining a metric r(A, B) on the set 21 of square matrices of 
a given order n with real entries. 

For a matrix A = (ai j) in 21, we let the number /z(A) equal the maximum abso- 
lute value of its entries: 


/x(A) = max \an\. (4.36) 

Lemma 4.35 The function /x(A) defined by relationship (4.36) exhibits the follow- 
ing properties : 

(a) /x(A) > 0 for A^ O and /x(A) = 0 for A — O . 

(b) /x(A + B) < /z (A) + p(B) for ail A, B e 21. 

(c) pt(AB) < np(A)p(B) for ail A, B e 21. 

Proof Property (a) obviously follows from the définition (4.36), while property (b) 
follows from an analogous inequality for numbers: | aij + bjj \ < \aij \ + \bij\. It re- 
mains to prove property (c). Let A = (ai y), B — (bij), and C = AB — (c z/ ). Then 
Cjj — aikbkj, and so 

n n 

ciik\\hj\ <^2fi(A)n(B) = nfi(A)fi(B). 

k= 1 k= 1 

From this it follows that /z(C) < np,(A)ki(B). □ 

We can now convert the set 21 into a metric space by setting for every pair of 
matrices A and B in 21, 

r(A,B) = /z(A- B). (4.37) 

Properties 1-3 introduced in the définition of a metric follow from the définitions in 
(4.36) and (4.37) and properties (a) and (b) proved in Lemma 4.35. 
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A metric on 21 enables us to introduce a metric on the set (B of bases of a vector 
space L. Let us fix a distinguished basis e \, . . . , e n and define the number r(x, y) 
for two arbitrary bases x and y in the set (B as follows. Suppose the bases x and y 
consist of vectors x\, ... , x n and y\, - y n , respectively. Then there exist linear 
transformations A and 33 of the space L such that 

A(ei) = Xi, &(e i ) = y i , i = l,...,n. (4.38) 

The transformations A and 33 are nonsingular, and by condition (4.38), they are 
uniquely determined. Let us dénoté by A and B the matrices of the transformations 
A and 33 in the basis e\, ... , e n , and set 


r(x,y) = r(A,B), (4.39) 

where r(A, B) is as defined above by relationship (4.37). Properties 1-3 in the défi- 
nition of a metric hold for r(x, y) from analogous properties of the metric r(A, B). 

However, here a difficulty arises: The définition of the metric r(x, y) by rela- 
tionship (4.39) dépends on the choice of some basis e\, ... , e n of the space L. Let 
us choose another basis e\, ... , e' n and let us see how the metric r'(x, y) that re- 
sults differs from r(x, y). To this end, we use the familiar fact that for two bases 
e i , . . . , e n and e \ , . . . , e' n there exists a unique linear (and in addition, nonsingular) 
transformation C : L ^ L taking the first basis into the second: 

e'i = C(ei), i = l,...,n. (4.40) 

Formulas (4.38) and (4.40) show that for linear transformations A — AC~ l and 
33 = 33C~ [ , one has the equality 

A ,(e')=Xi, £(e') = y h i = \,...,n. (4.41) 

Let us dénoté by A' and B' the matrices of the transformations A and S in the basis 
e r { , ... , e' n , and by A and B , the matrices of the transformations A and 33 in this 
basis. Let C be the matrix of the transformation C , that is, by (4.40), the transition 
matrix from the basis e\ , . . . , e' u to the basis e\ , . . . , e n . Then matrices A', A and 

B ' , B are related by A = A'C~ { and B — B'C~ [ . Furthermore, we observe that A 

and A ' are matrices of the same transformation A in two different bases (e\, ... ,e n 
and e\, ... , e' n ), and similarly, B and B' are matrices of the single transformation 33. 
Therefore, by the formula for changing coordinates, we hâve A' — C~ [ AC and 
B' — C -1 BC , and so as a resuit, we obtain the relationship 

A = A'C~ l =C~ l A, B = B'C~ l =C~ [ B. (4.42) 

Returning to the définition (4.39) of a metric on 21, we see that r'(x, y) = r(A, B). 
Substituting in the last relationship the expression (4.42) for matrices A and B , and 
taking into account définition (4.37) and property (c) from Lemma 4.35, we obtain 
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r'(x, y) = r(A, B) = r(C -1 A, C~ l B) 

— /x(C _1 (A — B )) < «/x(C _1 )/x(A — B) — ar(x, y), 

where the number a — n/i(C~ [ ) does not dépend on the bases x and y, but only 
on e\ , . . . , e n and e' v ... , e r n . Since the last two bases play a symmetric rôle in our 
construction, we may obtain analogously a second equality r(x, y) < fir'(x, y) with 
a certain positive constant fi . The relationship 

r'(x,y) <ar(x,y), r(x, y) < fir'(x, y), a, fi>0, (4.43) 

shows that although the metrics r(x, y) and r'(x , y) defined in terms of different 
bases e\, ... ,e n and e\ , . . . , e' n are different, nevertheless, on the set 21, the notion 
of convergence is the same for both bases. To put this more formally, having chosen 
in (£ two different bases and having with the help of these bases defined metrics 
r(x, y) and r'(x , y) on (£, we hâve thereby defined two different metric spaces (£ 7 
and (£ " with one and the same underlying set (£ but with different metrics r and r' 
defined on it. Here the identity mapping of the space (£ onto itself is not an isometry 
of (£' and (B", but by relationship (4.43), it is a homeomorphism. We may therefore 
speak about continuous mappings, paths in (£, and its connected components without 
specifying precisely which metric we are using. 

Let us move on to the question whether two bases of the set € can be continuously 
deformed into each other (see the general définition on p. xx). This question reduces 
to whether there is a continuous deformation between the nonsingular matrices A 
and B corresponding to these bases under the sélection of some auxiliary basis 
e\, ... ,e n (just as with other topological concepts, continuous deformability does 
not dépend on the choice of the auxiliary basis). We wish to emphasize that the 
condition of nonsingularity of the matrices A and B plays here an essential rôle. 

We shall formulate the notion of continuous deformability for matrices in a cer- 
tain set 21 (which in our case will be the set of nonsingular matrices). 

Définition 4.36 A matrix A is said to be continuously déformable into a matrix B 
if there exists a family of matrices A(t) in 21 whose éléments dépend continuously 
on a parameter r g [0, 1] such that A(0) = A and A(l) = B. 

It is obvious that this property of matrices being continuously déformable into 
each other defines an équivalence relation on the set 21. By définition, we need to 
verify that the properties of reflexivity, symmetry, and transitivity are satisfied. The 
vérification of ail these properties is simple and given on p. xx. 

Let us note one additional property of continuous deformability in the case that 
the set 21 has another property: for two arbitrary matrices belonging to 21, their 
product also belongs to 21. It is clear that this property is satisfied if 21 is the set of 
nonsingular matrices (in subséquent chapters, we shall meet other examples of such 
sets). 
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Lemma 4.37 If a matrix A is continuously déformable into B , and C G 21 is an 
arbitrary matrix , then AC is continuously déformable into BC, and CA is continu- 
ously déformable into C B . 

P roof B y the condition of the theorem, we hâve a family A(t) of matrices in 21, 
where t g [0, 1], effecting a continuons deformation of A into B. To prove the first 
assertion, we take the family A{t)C , and for the second, the family CA(t). This 
family produces the deformations that we require. □ 

Theorem 4.38 Two nonsingular square matrices of the same order xvith real élé- 
ments are continuously déformable into each other if and only if the signs oftheir 
déterminants are the same. 


Proof Let A and B be the matrices described in the statement of the theorem. The 
necessary condition that the déterminants \A \ and \B \ be of the same sign is obvious. 
Indeed, in view of the formula for the expansion of the déterminant (Sect. 2.7) or else 
by its inductive définition (Sect. 2.2), it is clear that the déterminant is a polynomial 
in the éléments of the matrix, and consequently, |A(f)| is a continuous function of t. 
But a continuous function taking values with opposite signs at the endpoints of an 
interval must take the value zéro at some point within the interval, while at the same 
time, the condition | A (01 7 ^ 0 must be satisfied for ail t g [0, 1]. 

Let us prove the sufficiency of the condition, at first for déterminants for which 
| A | >0. We shall show that A is continuously déformable into the identity matrix E. 
By Theorem 2.62, the matrix A can be represented as a product of matrices U/j ( c ), 
Sk, and a diagonal matrix. The matrix Ujj(c) is continuously déformable into the 
identity: as the family A(t), we may take the matrices Ujj(ct). Since the Sk are 
themselves diagonal matrices, we see that (in view of Lemma 4.37) the matrix A 
is continuously déformable into the diagonal matrix Z), and from the assumption 
| A | > 0 and the part of the theorem already proved, it follows that \D\ > 0. 

Let 
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Every element dj can be represented in the form Sj pi , where E[ — 1 or — 1 , while 
Pi > 0. The matrix (/?;) of order 1 for p\ > 0 can be continuously deformed into 
(1). For this, it suffices to set A(t) = (a(t)), where a(t) — t + (1 —t)pi for t G [0, 1]. 
Therefore, the matrix D is continuously déformable into the matrix D' , in which ail 
di — s, pi are replaced by £/. As we hâve seen, from this it follows that \D'\ > 0, 
that is, the number of — l’s on the main diagonal is even. Let us combine them in 
pairs. If there is — 1 in the i th and y th places, then we recall that the matrix 


-1 0 
0 -1 


(4.44) 
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defines in the plane the central symmetry transformation with respect to the origin, 
that is, a rotation through the angle n . If we set 


then we obtain the matrix of rotation through the angle nt, which as t changes from 
0 to 1, effects a continuons deformation of the matrix (4.44) into the identity. It is 
clear that we thus obtain a continuous deformation of the matrix D' into E. 

Denoting continuous deformability by we can write down three relationships: 
A ~ D, D ~ D ' , D f ~ E, from which follows by transitivity that A ~ E. From 
this follows as well the assertion of Theorem 4.38 for two matrices A and B with 
| A | > 0 and \B \ >0. 

In order to take care of matrices A with \A\ <0, we introduce the function 
£(A) = +1 if |A| > 0 and £(A) = — 1 if | A | < 0. It is clear that s (A B) = s(A)s(B). 
If £(A) = s(B) = —1, then let us set A -1 B = C. Then £(C) = 1, and by what was 
proved previously, C ~ E. By Lemma 4.37, it follows that B ~ A, and by symmetry, 
we hâve A ~ B. □ 

Taking into account the results of Sect. 3.4 and Lemma 4.37, from Theorem 4.38, 
we obtain the following resuit. 

Theorem 4.39 Two nonsingular linear transformations of a real vector space are 
continuously déformable into each other if and only if the signs oftheir déterminants 
are the same. 

Theorem 4.40 Two bases of a real vector space are continuously déformable into 
each other if and only ifthey hâve the same orientation. 

Recalling the topological notions introduced earlier of path-connectedness and 
path-connected component (p. xx), we see that the results we hâve obtained can be 
formulated as follows. The set 21 of nonsingular matrices of a given order (or linear 
transformations of the space L into itself) can be represented as the union of two 
path-connected components corresponding to positive and négative déterminants. 
Similarly, the set (£ of ail bases of a space L can be represented as the union of two 
path-connected components consisting of positively and negatively oriented bases. 



(4.45) 


Chapter 5 

Jordan Normal Form 


5.1 Principal Vectors and Cyclic Subspaces 

In the previous chapter, we studied linear transformations of real and complex vector 
spaces into themselves, and in particular, we found conditions under which a linear 
transformation of a complex vector space is diagonalizable, that is, has a diagonal 
matrix (consisting of eigenvectors of the transformation) in some specially chosen 
basis. We showed there that not ail transformations of a complex vector space are 
diagonalizable. 

The goal of this chapter is a more complété study of linear transformations of a 
real or complex vector space to itself, including the investigation of nondiagonal- 
izable transformations. In this chapter as before, we shall dénoté a vector space by 
L and assume that it is finite-dimensional. Moreover, in Sects. 5.1 to 5.3, we shall 
consider linear transformations of complex vector spaces only. 

As already noted, the diagonalizable linear transformations are the simplest class 
of transformations. However, since this class does not cover ail linear transforma- 
tions, we would like to find a construction that generalizes the construction of di- 
agonalizable linear transformations, and indeed so general as to encompass ail lin- 
ear transformations. A transformation can be brought into diagonal form if there is 
a basis consisting of the transformation^ eigenvectors. Therefore, let us begin by 
generalizing the notion of eigenvector. 

Let us recall that an eigenvector e ^ 0 of a linear transformation A : L —> L with 
eigenvalue À satisfies the condition A(e) — Xe,ov equivalently, the equality 

(A-k8)(e) = 0. 

A natural generalization of this is contained in the following définition. 

Définition 5.1 A nonnull vector e is said to be a principal vector of a linear trans- 
formation A : L -> L with eigenvalue À if for some natural number m, the following 
condition is satisfied: 

(A-k8) m (e) = 0. (5.1) 
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The smallest natural number m for which relation (5.1) is satisfied is called the 
grade of the principal vector e. 

Example 5.2 An eigenvector is a principal vector of grade 1. 

Example 5.3 Let L be the vector space of polynomials x(t) of degree at most n — 1, 
and let A be the linear transformation that maps every function x(t) to its dérivative 
x'(t). Then 

«A(jc(0) —x'(t), A k (x(t)) =x ( ' k \t). 

Since (**)<*> =k\^ 0 and (t k Ÿ M) = 0, it is obvious that the polynomial x(t) — t k 
is a principal vector of the transformation A of grade k + 1 corresponding to the 
eigenvalue X = 0. 

Définition 5.4 Let e be a principal vector of grade m corresponding to the eigen- 
value X. The subspace M spanned by the vectors 

e, (A-k8)(e), ..., («A - X8) m ~ l (e), (5.2) 

is called the cyclic subspace generated by the vector e. 

Example 5.5 If m — 1, then a cyclic subspace is the one-dimensional subspace (e) 
generated by the eigenvector e. 

Example 5.6 In Example 5.3, the cyclic subspace generated by the principal vector 
x(t ) = t k consists of ail polynomials of degree at most k. 

Theorem 5.7 A cyclic subspace McL generated by the principal vector e of grade 
m is invariant under the transformation A and has dimension m. 

P roof Since the cyclic subspace M is spanned by m vectors (5.2), its dimension is 
obviously at most m. We shall prove that the vectors (5.2) are linearly independent, 
which will imply that dim M — m. 

Let 

a\e + a 2 (A - X8)(e) H \-a m (A — X8) m ~\e) = 0. (5.3) 

Let us apply the linear transformation (A — X8) m ~ l to both sides of this equality. 
Since by définition (5.1) of a principal vector, we hâve (^ — X8) m (e) = 0, then a 
fortiori, (A — X8) k (e) — 0 for every k > m. We therefore obtain that 

ai(A-X8) m ~ l (e) = 0, 

and since (A — X8 ) m ~ 1 ( e ) 7 ^ 0, in view of the fact that e is of grade m, we hâve the 
equality ai = 0. Relationship (5.3) now takes the following form: 

a 2 (A - X8)(e) + • • • + a m (A - X8) m ~ l (e) = 0 . 


(5.4) 
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Applying the linear transformation (A — X8) m ~ 2 to both parts of equality (5.4), 
we prove in exactly the same way that 0^2 = 0. Continuing further in this way, we 
obtain that in relationship (5.3), ail the coefficients ot \, ... , a m are equal to zéro. 
Consequently, the vectors (5.2) are linearly independent, and so we hâve dim M — m. 

We shall now prove the invariance of the cyclic subspace M associated with the 
transformation A. Let us set 

e\=e, e 2 = (A-k8)(e), e m = (A — (5.5) 

Since ail vectors of the subspace M can be expressed as linear combinations of the 
vectors e\, ... , e m , it suffices to prove that the vectors A(e 1 ), . . . , A(e m ) can be 
expressed as linear combinations of e\, ... , e m . But from relationships (5.1) and 
(5.5), it is clear that 

(A — X6)(e[) = e 2 , (A — X8)(e 2 ) = £ 3 , •••, (A — X8)(e m ) = 0, 

that is, 


A(e\) = Xe \+e 2 , A(e 2 ) = Xe 2 + ^ 3 , A(e m ) = ke m , (5.6) 

which establishes the assertion of the theorem. □ 


Corollary 5.8 The vectors e\, ... ,e m defined by formula (5.5) form a basis of the 
cyclic subspace M generated by the principal vector e. The matrix of the restriction 
of the linear transformation to the subspace M in this basis has the form 

/ à . 0 0 0 \ 

1 À 0 0 

0 1 À : 

A= . . . . • (5.7) 

• • • • 

• • • 

: X 0 

^0 0 1 X J 


This is an obvious conséquence of (5.6). 


Theorem 5.9 Let M be a cyclic subspace generated by the principal vector e of 
grade m with eigenvalue X. Then an arbitrary vector y G M can be written in the 
form 

y = fW(t 0 , 

where f is a polynomial ofdegree at most m — 1 . If the polynomial f(t) is not divis- 
ible by t — À, then the vector y is also a principal vector of grade m and gene rates 
the same cyclic subspace M. 
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Proof The first assertion of the theorem follows at once from the fact that by the 
définition of a cyclic subspace, every vector y g M has the form 

y = aie + « 2 («A — Àé?)(e) H b a m (A ~ X8) m ~ l (e), (5.8) 

that is, y — f(A)(e ), where the polynomial f(t) is given by 

f (t) = a i + ^2 (t — X) a m (t — X) m 

Let us prove the second assertion. Let y — f(A)(e). Then (A — X8) m (y) = 0. 
Indeed, from the relationships y = f(A)(e) and (5.1) and taking into account the 
property established earlier that two arbitrary polynomials in one and the same linear 
transformation commute (a conséquence of Lemma 4.16 in Sect. 4.1; see p. 142), 
we obtain the equality 

(A - X8) m (y) = (A — X8) m f( A) (e) = f(A)(A - X8) m (e) = 0. 

Let us assume that the polynomial f(t ) is not divisible by t — X. This implies 
that the coefficient a\ is nonzero. We shall show that we then must hâve (A — 
X8y n ~ [ (y) 0. Applying the linear transformation (A — X8) m ~ { to the vectors on 

both sides of equality (5.8), we obtain 

(A-X8) m ~ l (y) 

= oq (A- X8) m -\e) + a 2 (A - X8) m (*) + •■•+ a m (A - X8) 2m ~ 2 (e) 

= a { (A-X8) m - [ (e), 

since we hâve (A — X8) k (e) = 0 for every k > m. From this last relationship and 
taking into account the conditions a\ ^ 0 and (A — X8)' n ~ l (e) ^ 0, it follows that 
(A — X8) m ~ [ (y) ^ 0. Therefore, the vector y is also a principal vector of the linear 
transformation A of grade m . 

Finally, we shall prove that the cyclic subspaces M and M' generated by principal 
vectors e and y coincide. It is clear that M'cM, since y g M, and in view of the 
invariance of the cyclic subspace M, the vector ( A — X8) k (y) for arbitrary k is 
also contained in M. But from Theorem 5.7, it follows that dimM = dimIVf = m, 
and therefore, by Theorem 3.24, the inclusion M f cM implies simply the equality 
M' = M. □ 

Corollary 5.10 In the notation of Theorem 5.9, for an arbitrary vector y G M and 
scalar /x 7 ^ À, we hâve the représentation y — {A — p,8)(z) for some vector z G M. 
Furthermore, we hâve the following: either y is a principal vector of grade m that 
generates the cyclic subspace M, or else y — (eA — X8)(z) for some vector z G M. 

Proof The matrix of the restriction of the linear transformation A to the subspace M 
in the basis e \ , . . . , e m from (5.5) has the form (5.7). From this, it is easily seen that 
for arbitrary the déterminant of the restriction of the linear transformation 
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A — fi8 to M is nonzero. From Theorems 3.69 and 3.70, it follows that the restriction 
of A — 118 to M is an isomorphism M^> M, and its image is («A — /x8)( M) = M; 
that is, for an arbitrary vector y g M, there exists a vector z G M such that y = 
G A-ijlSKz). 

By Theorem 5.9, a vector y can be represented in the form y = f(A)(e), and 
moreover, if the polynomial fit) is not divisible by t — À, then y is a principal 
vector of grade m generating the cyclic subspace M. But if fit) is divisible by t — À, 
that is, f{t) — (t — X)g(t) for some polynomial g(t), then setting z = g(«>4>)(e), we 
obtain the required représentation y = («A — X8)(z). □ 


5.2 Jordan Normal Form (Décomposition) 

For the proof of the major resuit of this section and indeed of the entire chapter — the 
theorem on the décomposition of a complex vector space as a direct sum of cyclic 
subspaces — we require the following lemma. 

Lemma 5.11 For an arbitrary linear transformation A : L L of a complex vector 
space , there exist a scalar À and an (n — l)-dimensional subspace L ; CL invariant 
with respect to the transformation A such that for every vector x G L, we hâve the 
equality 

*A(x) = A.x + y, where y e L' . (5.9) 

Proof By Theorem 4.18, every linear transformation of a complex vector space has 
an eigenvector and associated eigenvalue. Let À be an eigenvalue of the transforma- 
tion A. Then the transformation & = A — k8 is singular (it annihilâtes the eigen- 
vector), and by Theorem 3.72, its image <£?(!_) is a subspace M c L of dimension 
m < n. 

Let e \ , . . . , e m be a basis of M. We shall extend it arbitrarily to a basis of L by 
means of the vectors e m +\ , . . . , e n . It is clear that the subspace 

L — (e y, , € m , e m _)_ i , . . . , e n —i) 

has dimension n — 1 and includes M, since e \ , . . . , e m e M. 

Let us now prove equality (5.9). Consider an arbitrary vector x e L. Then we 
hâve ^(x)g^ 8(L) = M, which implies that £(x) e L, since M c L. Recalling that 
A = 33 + À 8 , we obtain that A (x ) = (x ) + kx , and moreover, by our construction, 
the vector y = £(x) is in L. From this, the invariance of the subspace L easily 
follows. Indeed, if x g L, then in equality (5.9), we hâve not only y G L, but also 
kx G L, which yields that <A(x) g L as well. □ 

The main resuit of this section (the décomposition theorem) is the following. 

Theorem 5.12 A finite-dimensional complex vector space L can be decomposed 
as a direct sum of cyclic subspaces relative to an arbitrary linear transformation 
A\ L. 
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Proof The proof will be by induction on the dimension n — dim L. It is based on the 
lemma proved above, and we shall use the same notation. Let L' C L be the same 
(; n — l)-dimensional subspace invariant with respect to the transformation A that 
was discussed in Lemma 5.1 1. 

We choose any vector e' L'. If / 1? . . . , f n -\ is any basis of the subspace L', 
then the vectors / ls . . . , f n _ e f form a basis of L. Indeed, there are n — dimL 
vectors, and so it suffices to prove their linear independence. Let us suppose that 

h oi n -[ f n _i + fie — 0. (5.10) 

If p ^ 0, then from this equality, it would follow that e' g L'. Therefore, fi — 0, and 
then from equality (5.10), by the linear independence of the vectors f , f n -\ 
it follows that a\ = • • • = ot n -\ — 0. 

We shall rely on the fact that the vector e' e L can be chosen arbitrarily. Till 
now, it satisfied only the single condition e' £ L', but it is not difficult to see that 
every vector e" — e' + x, where x g L', satisfies the same condition, and this means 
that any such vector could hâve been chosen in place of e f . Indeed, if e" g L', then 
considering that x g L, we would hâve e' g L , contradicting the assumption. 

It is obvious that Theorem 5.12 is true for n — L Therefore, by the induction 
hypothesis, we may assume that it holds as well for the subspace L'. Let 

L = Li ® • • • ® L r (5.11) 

be the décomposition of L as a sum of cyclic subspaces, and moreover, suppose that 
each cyclic subspace L/ is generated by its principal vector ei of grade m/ associated 
with the eigenvalue À/ and has the basis 

et, (A-XtSHei), (A - A.,- S)™'- 1 («,-)• (5.12) 

By Theorem 5.7, it follows that dimL/ = m/ and n — 1 = m\ H h m r . 

For the vector e' chosen at the start of the proof, we hâve, by the lemma, the 
equality 

A(e f ) = ke' + y, where y g L'. 

In view of the décomposition (5.11), this vector y can be written in the form 

y = y H — + jv> ( 5 - 13 ) 

where y- t g L/ . Thanks to Corollary 5. 10, we may assert that the vector either can 
be written in the form (A — X8)(zi) for some zi G L/, or is a principal vector of 
grade m/ associated with the eigenvalue À. Changing if necessary the numération of 
the vectors y h we may write 

(A - XS)(e') = (A- X8)(z) + y s + ■ ■ ■ + y r , (5.14) 

where z — z H h z s -u Zi g L/, for ail i — l, s — 1, and each of the vectors 

y j with indices j = s , ... , r generates the cyclic subspace L y. 
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Here there are two possible cases. 

Case 1. In formula (5.14), we hâve s — 1 = r, that is, 

(«A - XS)(e') = (A — X8)(z) f z e L'. 

Choosing the vector e' arbitrarily, as discussed above, we set e" — e' — z. Then from 
the previous relationship, we obtain 

(A — X8)(e”) = 0. 

B y définition, this implies that e" is an eigenvector with eigenvalue X. Consider the 
one-dimensional subspace L r+ i = (e"). It is clear that it is cyclic, and moreover, 


L — L' ® L r _|_i — l_i ® • • • ® L r ® L r _|_ i . 


Theorem 5.12 has been proved in this case. 

Case 2. In formula (5.14), we hâve s — 1 < r. We again set e " — e' — z. Then from 
(5.14), we obtain that 


(*4 -*«)(*") = * + ■■■ + *,., (5.15) 

where by construction, each y ; -, j = s , . . . , r, is a principal vector of grade m j 
corresponding to the eigenvalue X generating the cyclic subspace L y. 

It is clear that we can always order the vectors y s , . . . , y r in such a way that 
m s < • • • < m r- Let us assume that this condition is satisfied. We shall prove that the 
vector e " is a principal vector of grade m r + 1 with associated eigenvalue À, and we 
shall show that we then hâve the following décomposition: 

L = l_i ® • • • ® L r _i ® L^., (5.16) 

where L' r is a cyclic subspace generated by the vector e" . It is clear that from this 
will follow the assertion of Theorem 5.12. From the equality (5.15), it follows that 

(A - X6) mr+1 (e") = (A - XS) mr og H 1- (=A — XS) mr (j,). (5.17) 

Since the principal vectors y h i — s , . . . , r, hâve grades m/, and since by our as- 
sumption, ail the m/ are less than or equal to m r , it follows that (.A — X8) mr (y, ) = 0 
for ail i = s, ...,r. From this, taking into account (5.17), it follows that (cA — 
X8) mr + { (e") = 0. In just the same way, we obtain that 

(A - X8) mr ( e ") = (A - X8) mr - l (y s ) + ■ ■ ■ + (A — X8) mi - 1 Ov). (5.18) 

The terms on the right-hand side of this sum belong to the subspaces L s , . . . , L r . If 
we had the equality 


{A - X8) mr (e") = 0, 
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then it would follow that ail the terms on the right-hand side of (5.18) would be 
equal to zéro, since the subspaces L s , ... , L r form a direct sum. In particular, we 
would obtain that (A — X8) mr ~\y r ) — 0, and this would contradict that the prin- 
cipal vector y r has grade m r . We therefore conclude that (<A — X8) nir (e") ^ 0, and 
consequently, the principal vector e" has grade m r -h 1. 

It remains to prove relationship (5.16). We observe that the dimensions of the 
spaces Li , . . . , L r _i are equal tomj,..., m,-_i , while the dimension of L' is equal 
to m r + 1. Therefore, from equality (5.12), it follows that the sum of the dimensions 
of the terms on the right-hand side of (5.16) equals the dimension of the left-hand 
side. Therefore, in order to prove the relationship (5.16), it suffices by Corollary 3.40 
(p. 96) to prove that an arbitrary vector in the space L can be represented as the sum 
of vectors from the subspaces l_i , . . . , L r _i, L'. 

It suffices to prove this last assertion for ail vectors in a certain basis of the 
space L. Such a basis is obtained in particular if we combine the vector e " and the 
vectors of certain bases of the subspaces Li, . . . , L r . For the vector e " , this assertion 
is obvious, since e n G L' . In just the same way, the assertion is clear for any vector 
in the basis of one of the subspaces Li , . . . , L,-_i . It remains to prove this for vectors 
in some basis of the subspace L r . Such a basis, for example, comprises the vectors 

3 -,., (A-XS)(y r ), {A-XS) m '-\y r ). 

From (5.15), it follows that 

y r — —(y s H — + y r ~ i) + (A — X8)(e"), 

and this means that 

GA - X8) k (y r ) = -GA - X8) k (y s ) GA - ^8) k {y r _ j) + GA - X8) k+1 (e") 

for ail k — 1 , ... ,m r — 1. And this establishes what we needed to show: since 

y s £ Us, •••> JV— î g L r — i, e" g L' r , 

and since the spaces L s , ... , L r _i and L' are invariant, it follows that 

(A — X8) k (y s ) G L v , ..., («A — X8) k (y r _i) e L r _i, 

(A - X8) k+l (e") g l_;. 

This complétés the proof of Theorem 5.12. □ 

Let us note that in the passage from the subspace L' to L for a given À, the dé- 
composition into cyclic subspaces changes in the following way: either in the dé- 
composition there appears one more one-dimensional subspace (case 1), or else the 
dimension of one of the cyclic subspaces increases by 1 (case 2). 

Let the décomposition into a direct sum of subspaces, whose existence is estab- 
lished by Theorem 5.12, hâve the form 


L — Li ® • • • ® L r . 
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In each of the subspaces L z , we will select a basis of the form (5.5) and combine 
them into a single basis e \ , . . . , e n of the space L. In this basis, the matrix A of the 
transformation A has the block-diagonal form 



(5.19) 


where the matrices A/ hâve (by Corollary 5.8) the form 



0 

*i 

1 


0 

0 

A/ 


°\ 

0 


: • . Xi 0 

y 0 0 1 X[ J 


(5.20) 


The matrix A given by formulas (5.19) and (5.20) is said to be in Jordan normal 
form , while the matrices A/ are called Jordan blocks. We therefore hâve the follow- 
ing resuit, which is nothing more than a reformulation of Theorem 5.12. 


Theorem 5.13 For every linear transformation of a finite-dimensional complex vec- 
tor space , there exists a basis ofthat space in which the matrix ofthe transformation 
is in Jordan normal form. 


Corollary 5.14 Every complex matrix is similar to a matrix in Jordan normal form. 


P roof As we saw in Chap. 3, an arbitrary square matrix A of order n is the matrix of 
some linear transformation A : L — ► L in some basis e \, . . . , e n . B y Theorem 5.13, 
in some other basis e f v . . . , e' n , the matrix A' of the transformation A is in Jordan 
normal form. As established in Sect. 3.4, the matrices A and A' are related by the 
relationship (3.43), for some nonsingular matrix C (the transition matrix from the 
first basis to the second). This implies that the matrices A and A' are similar. □ 


5.3 Jordan Normal Form (Uniqueness) 

We shall now explore the extent to which the décomposition of the vector space L as 
a direct sum of cyclic subspaces relative to a given linear transformation A : L -> L 
is unique. First of ail, let us remark that in such a décomposition 


L — l_i ® • • • ® L r , 


(5.21) 
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the subspaces L / themselves are in no way uniquely determined. The simplest ex- 
ample of this is the identity transformation A = 8. For this transformation, every 
nonnull vector is an eigenvector, which means that every one-dimensional subspace 
is a cyclic subspace generated by a principal vector of grade 1 . Therefore, any dé- 
composition of the space L as a direct sum of one-dimensional subspaces is a dé- 
composition as a direct sum of cyclic subspaces, and such a décomposition exists 
for every basis of the space L; that is, there are infinitely many of them. 

However, we shall prove that eigenvalues À/ and the dimensions of the cyclic 
subspaces associated with these numbers coincide for every possible décomposition 
(5.21). As we hâve seen, the Jordan normal form is determined solely by the eigen- 
values Xi and the dimensions of the associated subspaces (see formulas (5.19) and 

(5.20) ). This will give us the uniqueness of the Jordan normal form. 

Theorem 5.15 The Jordan normal form of a linear transformation is completely 
determined by the transformation itself up to the ordering of the Jordan blocks. In 
other words,for the décomposition (5.21) of a vector space L as a direct sum of 
subspaces that are cyclic for some linear transformation A : L —> L, the eigenvalues 
Xi and dimensions mi of the associated cyclic subspaces L,- dépend only on the 
transformation A and are the samefor ail décompositions (5.21). 

P roof Let À be some eigenvalue of the linear transformation A and let (5.21) be one 
possible décomposition. Let us dénoté by l m (m = 1,2,...) the integer that indicates 
how many ra-dimensional cyclic subspaces associated with X are encountered in 

(5.21) . 

We shall give a method for calculating l m , based on X and A only. This will prove 
that this number in fact does not dépend on the décomposition (5.21). 

Let us apply to both sides of equality (5.21) the transformation (<A — X8) 1 with 
some i > 1. It is clear that 

(A - XSf (L) = (A — X&y(L\) ® • • • 0 (A — kSYQLr). (5.22) 

We shall now détermine the dimensions of the subspaces («A — X8) 1 (Lk). In the 
course of proving the corollary to Theorem 5.9 (Corollary 5.10), we established that 
for arbitrary /i / À, the restriction of the linear transformation A — p8 to M is an 
isomorphism, and its image (A — p8)( M) is equal to M. Therefore, if corresponds 
to the number X k =fX, then 


(A-X8Y(L k ) = Lk, X k ^X. (5.23) 

But if X k — X , then choosing in the basis e, (eA — X8)(e), ...,(A — X8) mk ~ l (e), 
where m k = dimL^, that is, it is equal to the grade of the principal vector e , we 
obtain that if i > m k , then the subspace (A — X8) l ( L&) consists solely of the null 
vector, while if i <m k , then 


(A -X8Ÿ (L A .) = ((A - k8)‘ (e), . . . , («A - X8 ) mk ~ 1 ( e )), 
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and moreover, the vectors (A — X8) 1 (e), (A — X8) mk 1 ( e ) are linearly inde- 
pendent. Therefore, in the case Xk = À, we obtain the formula 


dim( e A — X8) 1 (L*) = 


I 0 ' ■ 

[tn k - i, 


if i > m k , 
if i < mk. 


(5.24) 


Let us dénoté by n r the sum of the dimensions of those subspaces L* that corre- 
spond to the numbers Xk ^ X. Then from formulas (5.22)-(5.24), it follows that 

dim(e>4) — X8)' (L) = li+\ H- 2//_|_2 + ••• + (/? — i)lp + n , (5.25) 


where p is the maximal dimension of a cyclic subspace associated with the given 
value À in the décomposition (5.21). Indeed, from the equality (5.22), we obtain that 

dim( e A — X8) 1 (L) = dim(eA — X8) 1 (l_i) H h dim(eA — X8) 1 (L r ). (5.26) 


It follows from formula (5.23) that the terms dim(eA — X8) 1 (L*) with Xk X in the 
sum give n r . In view of formula (5.24), the terms dim(eA — Àê , ) / (L^) with Xk = X 
and mk < i are equal to zéro. Furthermore, from the same formula (5.24), it follows 
that if mk = i + 1, then dim^ — X8) l ( L&) = 1, and the number of subspaces L& 
of dimension mk = i + 1 will be equal to /,+ 1 by the définition of the number l m . 
Therefore, in formula (5.26), the number of terms equal to 1 will be /,•+ 1 . Similarly, 
the number of subspaces of dimension = / + 2 will be equal U + 2 , but with this, 
we already hâve dim^ — X8) 1 ( L^) = 2, whence on the right-hand side of (5.25), 
there appears the term 2// +2, an ^ so on. From this follows the equality (5.25). 

Let us recall that in Sect. 3.6, we defined the notion of the rank rk £ of an ar- 
bitrary linear transformation & : L — >• L. Here, rk £ coincides with the dimension 
of the image 33 (L) and is equal to the rank of the matrix B of this transformation, 
regardless of the basis e\ , . . . , e n in terms of which the matrix of the transformation 
is written. 

Let us now set r, = vk(A — X8) 1 for i = 1, . . . , p. Let us write the relationships 
(5.25) for i = 1, . . . , p by taking into account the fact that 

dim( e A — X8) 1 (L) = rk (A — X8) 1 — r; and l s — 0 for s > p , 


and let us consider also the equality 


n — l\ -h 2/2 -|- • • • -|- plp -|- n' , 
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which follows from formula (5.21) or from (5.25) for i — 0. As a resuit, we obtain 
the relationships 


h + 2/2 + 3/3 + + plp + n' — n, 

h H - 2/3 + ••• + (/?— l)lp + n' — r \ , 


l p +n' = r p -\ , 
n! = r p . 


from which it is possible to express l \ , . . . , l p in terms of r \ , . . . , r p . 
Indeed, subtracting from each équation the one following it, we obtain 




h + 


+ lp=n- n. 




h + 


+ l p =r\ — r 2 . 


(5.27) 



Repeating this same operation, we obtain 


1 1 =n — 2r\ +r 2 , 
h = r\ -2 r 2 + r 3 , 


(5.28) 


lp—\ — f p—2 2r p— 1 H - y p , 

lp — r p -\ ~ r p- 

From these relationships, it follows that the numbers /, are determined by the num- 
bers r, , which means that they dépend only 011 the transformation A. □ 


Corollary 5.16 In the décomposition (5.21), the subspace cissociated with the num- 
ber À occurs if and only if X is an eigenvalue ofthe transformation A. 


P roof Indeed, if X is not an eigenvalue, then the transformation A — XS is nonsin- 
gular, and this means that the transformations (A — X8) 1 are nonsingular as well. 
In other words, r, = n for ail i = 1, 2, From the formulas (5.27), it then fol- 

lows that ail f are equal to 0, that is, in the décomposition (5.21), there are no 
subspaces associated with X. Conversely, if /,• = 0, then from (5.28), we obtain that 
r n — r n - 1 = • • • = r\ = n. But the equality r\ — n means precisely that the transfor- 
mation A — X8 is nonsingular. □ 
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Corollary 5.17 Square matrices A and B oforder n are similar if and only iftheir 
eigenvalues coincide and for each eigenvalue X and each i <n,we hâve 

rk (A - XEf = rk (B - XEf . (5.29) 

P roof The necessity of conditions (5.29) is obvious, since if A and B are similar, 
then so are the matrices (A — XE) 1 and ( B — XE) 1 , which means that their ranks are 
the same. 

We now prove sufficiency. Suppose that the conditions (5.29) are satisfied. We 
shall construct transformations A : L — ► L and : L — > L having in some basis 
e i , . . . , e n of the vector space L the matrices A and B . Let the transformation A 
be brought into Jordan normal form in some basis f . . . , f n , and the same for $ 
in some basis g { , . . . , g n . In view of equality (5.29) and using formulas (5.25), we 
conclude that these Jordan forms coincide. This means that the matrices A and B 
are similar to some third matrix, and consequently, by transitivity, they are similar 
to each other. □ 

As an additional application of formulas (5.27), let us détermine when a matrix 
can be brought into diagonal form, which is a spécial case of Jordan form in which 
ail the Jordan blocks are of order 1. In other words, ail the cyclic subspaces are 
of dimension one. This means that l 2 = • • • = l n = 0. From the second equality 
in formulas (5.27), it follows that for this, it is necessary and sufficient that the 
condition r\ — r 2 be satisfied (for sufficiency, we must use the fact that // > 0). We 
hâve thus proved the following criterion. 

Theorem 5.18 A linear transformation A can be brought into diagonal form if and 
only if for every one ofits eigenvalues À, we hâve 

rk(,A - X8) = rk(^4> - X8) 2 . 

Of course, an analogous criterion holds for matrices. 


5.4 Real Vector Spaces 

Up to this point, we hâve been considering linear transformations of complex vector 
spaces (this is related to the fact that we hâve continually relied on the existence 
of an eigenvector for every linear transformation, which may not be true in the real 
case). However, the theory that we hâve built up gives us a great deal of information 
about the case of transformations of real vector spaces as well, which are especially 
important in applications. 

Let us assume that the real vector space l_o is embedded in the complex vector 
space L, for example its complexification (as was done in Sect. 4.3), while a linear 
transformation Ao of the space l_o détermines a real linear transformation A of the 
space L. In this section and the following one, a bar will dénoté complex conjuga- 
tion. 
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Theorem 5.19 In the décomposition of the space L into cyclic subspaces with re- 
spect to the recil linear transformation A, the number of cyclic m-dimensional 
subspaces associated with the eigenvalue X is equal to the number of cyclic m- 
dimensional subspaces associated with the complex-conjugate eigenvalue X. 

Proof Since the characteristic polynomial of a real transformation A has real coef- 
ficients, it follows that for each root À, the number X is also a root of the character- 
istic polynomial. Let us dénoté, as we did in the proof of Theorem 5.15, the number 
of cyclic m-dimensional subspaces for the eigenvalue X by l m , and the number of 
cyclic m-dimensional subspaces for the eigenvalue X by l' m . In addition, we define 
rj = rk(A — X8f and r- = rk( e A — X8) 1 . Formulas (5.28) express the numbers l m 
in terms of r m . Since these formulas hold for every eigenvalue, they also express 
the numbers l' m in terms of r' m . Consequently, it suffices to show that r • = r, , from 
which it will follow that /' = // , which is the assertion of the theorem. 

To this end, we consider some basis of the space Lo (as a real vector space). It 
will also be a basis of the space L (as a complex vector space). Let A be the matrix of 
the linear transformation A in this basis. By définition, it coincides with the matrix 
of the linear transformation <Ao in the same basis, and therefore, it consists of real 
numbers. Hence the matrix A — XE is obtained from A — XE by replacing ail the 
éléments by their complex conjugates. We shall write this as 


A — XE = A — XE. 


It is easy to see that from this, it follows that for every i > 0, the équation 

(A-XEY = (A- XEy 

is satisfied. Thus our assertion is reduced to the following: if B is a matrix with 
complex éléments and the matrix B is obtained from B by replacing ail its éléments 
with their complex conjugates, then rki? = rkZL The proof of this follows at once, 
however, from the définition of the rank of a matrix as the maximal order of the 
nonzero minors: indeed, it is clear that the minors of the matrix B are obtained 
by complex conjugation from the minors of B with the same indices of rows and 
columns, which complétés the proof of the theorem. □ 


Thus according to Theorem 5.19, the Jordan normal form (5.19) of a real linear 
transformation consists of Jordan blocks (5.20) corresponding to real eigenvalues À* 
and pairs of Jordan blocks of the same order corresponding to complex-conjugate 
pairs of eigenvalues À; and À, . 

Let us see what this gives us for the classification of linear transformations of 
a real vector space Lo. Let us consider the simple example of the case dimLo = 2. 
By Theorem 5.19, the Jordan normal form of the linear transformation A of the 
complex space L can hâve one of the three following forms: 
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where a and /3 are real, and À is a complex, not real, number, that is, À = a + ib , 
where i 2 — — 1 and b ÿLO. 

In cases (a) and (b), as can be seen from the définition of the linear transformation 
A, the matrix of the transformation A-o already has the indicated form in some basis 
of the real vector space Lo. 

As we showed in Sect. 4.3, in case (c), the transformation A>o has in some basis 
the matrix 

a — b\ 
b a J 

Thus we see that an arbitrary linear transformation of a two-dimensional real vector 
space has in some basis one of three forms: 

<■> (s ?)• < b > c «)• <c> (» A <5 30) 

where a, /3, a, b are real numbers and b ^ 0. By formula (3.43), this implies that an 
arbitrary real square matrix of order 2 is similar to a matrix having one of the three 
forms of (5.30). 

In a completely analogous way, we may study the general case of linear transfor- 
mations in a real vector space of arbitrary dimension. 1 By the same line of argument, 
one can show that every real square matrix is similar to a block-diagonal matrix 

0 ••• 0\ 

A 2 • • • 0 

• . • ’ 

• • • 

0 • • • A r J 

where A/ is either a Jordan block (5.20) with a real eigenvalue À/ or a matrix of even 
order having the block form 




/Ai 0 0 

E Ai 0 



0 E Ai 


°\ 

0 


: • • Ai 0 

\0 0 E Aj) 


1 One may find a detailed proof in, for example, the book Lectures on Algebra, by D.K. Faddeev (in 

Russian) or in Sect. 3.4 of Matrix Analysis, by Roger Horn and Charles Johnson. See the references 
section for details. 
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in which the blocks A; and E are matrices of order 2: 



5.5 Applications* 


For a matrix A in Jordan normal form, it is easy to calculate the value of /(A), 
where f(x) is any polynomial of degree n. First of ail, let us note that if the matrix 
A is in block-diagonal form 



0 ••• 0\ 

A2 • • • 0 

• • 

• • • 

• • • 

0 ••• A r ) 


with arbitrary blocks Ai , . . . , A r , then 


/(A) = 


//(Ai) 

! 0 


0 

/(a 2 ) 


° ) 
0 


V 0 0 ••• /(A,.)/ 


This follows immediately from the décomposition of the space L as L = Li ® • • • ® 
L r , a direct sum of invariant subspaces, and from the fact that a linear transformation 
with matrix A defines on L / a linear transformation with matrix A, . 

Thus it remains only to consider the case that A is a Jordan block, that is, 



/À 0 
1 À. 

0 1 


0 

0 

À 


°\ 

0 


(5.31) 


\0 


0 


À. 

1 


v 
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It will be convenient to represent it in the form A — XE + B, where 


/O 0 0 

1 0 0 

0 1 0 


°\ 

0 


: *.0 0 

^0 o 1 0 J 


(5.32) 


Let us now write down Taylor’s formula for a polynomial of degree n: 

S, , ' s, ^ , , /"(*) 2 , , f (n) (x) n 

f(x + y) — f (x) + f (x)y H ——y H 1 - — y . (5.33) 

2! n\ 

We note that for the dérivation of formula (5.33), we hâve to compute the binomial 
expansion of (x + y) k , k = 2, . . . , n, and then, of course, use commutativity of mul- 
tiplication of numbers. If the commutative property did not hold, then we would not 
be able to obtain, for example, the expression (. x + y) 2 = y 2 -h 2xy -h x 2 , but only 
(jc + y) 2 — y 2 -\- yx -\- xy -\- x 2 . Therefore, in formula (5.33), we may replace x and 
y by numbers, but not by arbitrary matrices, instead only those that commute. 

Let us substitute in formula (5.33) the arguments x — XE and y = B, since the 
matrices XE and B obviously commute. As is easily verified, for an arbitrary poly- 
nomial f(XE) — f(X)E , we obtain the expression 

, /"(A.) , / W (A) „ 

f(A) = f{X)E + f(X)B + L yr 1 B 2 + ■ ■ ■ + J -^B n . (5.34) 

2! n\ 

We now observe that in the basis e \ , . . . , e m of the cyclic subspace generated by 
the principal vector e of grade m, the transformation <£ with B of the form (5.32) 
assumes the following form: 

for i < m — 1 , 
for i > m — 1 . 



Applying the formula k times, we obtain that 


S k (ei) = 


| e i+k 

1 ° 


for i <m — k, 
for i > m — k. 
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From this, it is clear that the matrix B k has the following very simple form: 


/O 0 



1 0 
0 1 
0 0 



0 0 ••• 0 1 0 ••• 0 

\0 0 • • • 0 0 1 ... 0 / 


In order to describe this in words, we shall call the collection of éléments aij in the 
matrix A — ( a t j ) with i = j the main diagonal , while the collection of éléments a\j 
with i — j —k (where k is a given number) forming a diagonal parallel to the main 
diagonal will be called the diagonal lying k steps from the main diagonal. Thus in 
the matrix B k , the diagonal lying k steps from the main diagonal contains ail F s, 
while the remaining matrix entries are zéro. 

Formula (5.34) now gives for a Jordan block A of order m the expression 


/(A) = 


/ (PO 

0 

0 

... 0 

°\ 


n 

(po 

0 

... 0 

0 


(p 2 

<PI 

(po 

• 

• 

■ • 

0 

1 

(Pm—2 

f Pm—3 

• 

• 

(Po 

0 


yPm— 1 

(Pm—2 

(Pm— 3 

••• (P\ 

n) 

that is, 

the numbers cpk 

are the coefficients 


(5.35) 


where cpk = f (k \X)/k 
expansion (5.34). 

Let us look at a very simple example. Suppose we wish to raise a matrix A of or- 
der 2 to a very high power p (for example, p = 2000). To perform such calculations 
by hand seems hopeless. But the theory that we hâve constructed proves here to be 
very useful. Let us find an eigenvalue of the linear transformation A with matrix A, 
that is, a root of the second-degree trinomial \A — XE\. Here two cases are possible. 


Case 1. The trinomial | A — XE\ has distinct roots and À 2 - We can easily find the 
associated eigenvectors e\ and , for which 

{A — k.\8){€\) — 0 , (e>4> — A.2^)(^2) — 

As we know, the vectors e\ and e 2 are linearly independent, and in the basis e\, e2, 
the transformation A has the diagonal matrix ( ^ ^ ) . If C is the transition matrix 


5.5 Applications* 


179 


from the original basis in which the transformation A has matrix A to the basis 
ei, e 2 , then 

a = c_i ( » :) c ' ,5 - 36 > 

whence is easily obtained for any p (as large as desired), the formula 

• 4 " = c " , (o °') c - <5 37) 

Let us now consider the second case. 

Case 2. The trinomial | A — XE\ has a multiple root X (which therefore must be real). 
Then the Jordan normal form of the matrix A has the form of a single block ( j ^ ) or 

( q ^ ) . In the latter variant, the Jordan normal form of the matrix is equal to XE, and 
therefore the matrix A is also equal to XE (this follows, for example, from the fact 
that if in some basis, a linear transformation has the matrix XE , then it will hâve the 
same matrix in every other basis as well). Thus in this last variant we are dealing 
with the previous case, in which Ài = X 2 = X, and the calculation of A p is obtained 
by formula (5.37), where we hâve only to substitute Ai and X 2 for X. It remains to 
consider the first variant. For a Jordan block ( ), by formula (5.35), we obtain 

(x oy_/ x p o\ 

\l x) \pX p - { X p ) ' 

If e \ , e 2 are vectors such that 

(A — X8)(e\) 7 ^ 0, e 2 = (A — X8)(e\), 

then in the basis e \ , e 2 , the matrix of the transformation A is in Jordan normal form. 
We dénoté by C the transition matrix to this basis, and using the transition formula 

;) c , 

we obtain 

AP = c ~'(p^-' x°-) c - <538) 

Formulas (5.37) and (5.38) solve our problem. 

We can now apply the same ideas not only to polynomials, but to other functions, 
for example those given by a convergent power sériés. Such functions are called 
analytic. To do this, we need the concept of convergence of a sequence of matrices. 
Let us recall that the notion of convergence for a sequence of square matrices of 
a given order with real coefficients was defined earlier, in Sect. 4.4. Moreover, in 
that same section, we introduced on the set of such matrices the me trie r(A, B ), 
after converting it to a metric space, on which the notion of convergence is defined 
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automatically (see p. xvii). It is obvious that the metric r(A, B) defined by formulas 
(4.36) and (4.37) is also a metric on the set of square matrices of a given order with 
complex coefficients, and therefore transforms it into a metric space. 

With this définition, the convergence of a sequence of matrices A {k) — (a^), 

(k) 

k — 1,2,..., to a matrix B — (b/j) means that a]- -> b[j for k oo for ail /, j . 

In this case, we write A {k) —> B for k —> oo or lim^oo A^ = B. The matrix B 
is called the limit of the sequence A^ k \ k — 1,2,.... Similarly, we can define the 
limit of a family of matrices A(h) depending on a parameter h assuming values 
that are not necessarily natural numbers (as was the case for a sequence), but real 
values, and approaching an arbitrary value ho. By définition, lim/ 7 ^/ î0 A(h) — B if 
lim/ 7 _^/ 7o r(A(h), B) — 0. In other words, this means that lim h-*h 0 a ij(h) = bij for 
ail i, j. 

Just as in the case of numbers, once we hâve the notion of convergence of a se- 
quence of matrices, it is possible to talk about the convergence of sériés of matrices. 
Without any alteration, we can transfer theorems on sériés known from analysis to 
sériés of matrices. Let the function f(x) be defined by the power sériés 

f(x) = û?o + ot\x H h oikX k H . (5.39) 


Then by définition, 


/(A) = olqE + nqA H h a k A k H . (5.40) 

Suppose the power sériés (5.39) converges for \x\ < r and the matrix A is in the 
form of a Jordan block (5.31) with eigenvalue À, of absolute value less than r. Then, 
examining the sum of the first k terms of the sériés (5.40) and passing to the limit 
k oo, we obtain that the sériés (5.40) converges, and for /(A), formula (5.35) 
holds. If we now take a matrix A! similar to some Jordan block A, that is, related 
to it by A' = C -1 AC, where C is some nonsingular matrix, then from the obvious 
relationship (C -1 AC) k — C~ l A k C, we obtain from (5.40) that 

/(A ') = C~\a 0 E + a\A H h a k A k + ■ ■ )C = C~ l f(A)C. (5.41) 

Formulas (5.35) and (5.41) allow us to compute /(A) for any analytic function 
f(x). Using results from analysis, we can extend the notion of functions of matrices 
to a wider class of functions (for example, to continuous functions with the help of 
the theorem on uniform approximation of continuous functions by polynomials). 
However, we shall not address these questions here. 

In applications, of especial importance are exponentials of matrices. We recall 
that the exponential function of a number x can be defined by the sériés summation 

e x — 1 + x + — - x^ + • • • + —x k + • • • , 

2! k\ 


(5.42) 
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which, as proved in a course in analysis, converges for ail real or complex num- 
bers v. According to this, the exponential of a matrix A is defined by the sériés 

e A = E + A + ^-A 2 + --- + ^-A k +--- , (5.43) 

2! kl 

which converges for every matrix A with real or complex entries. 

Let us verify that if matrices A and B commute, then a basic property of the 
numerical exponential function is transferred to the matrix exponential function: 

e A e B = e A + B . (5.44) 

Indeed, substituting into the left-hand side of (5.44) the expressions (5.43) for e A 
and e B , removing parenthèses, and collecting like terms, we obtain 


e A e D = [ £ + A + ^A 2 + ^A 3 + 


E + B + -B 2 + -B 3 + 
2! 3! 


= £ + (A + 5) + ( i-A 2 + A5+iz? 2 


+ | —A 3 + — A 2 B + — AB 2 + — B J | + 
3! 2! 2! 3! 


1 


1 


E + (A + B) + - (A + BY + - (A + BY + 


which coincides with the expression (5.43) for e A+B . As justification for the gener- 
alization made above, it is necessary to note that first of ail, as is known from anal- 
ysis, for the corresponding exponential function (5.43), the numeric sériés (5.42) 
converges absolutely on the entire real axis (this allows the terms to be summed 
in arbitrary order), and second, matrices A and B commute (without this, this last 
generalization would be impossible, which we know by virtue of what we discussed 
earlier on page 177). 

In particular, from (5.44) follows the important relationship 

e A(t+s) = e At £ As ( 5 .45) 


for ail numbers t and s and every square matrix A. From this, it is easy to dérivé 
that 

—e At = Ae At (5.46) 

dî 

(understanding that différentiation of the matrix function is to be taken element- 
wise). 

Indeed, by the définition of différentiation, 


d 



A(t+h ) _ At 
lim — 


dt 


h 
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while from (5.45), it follows that 


e A(t+h) _ e At e Ah e Aî — e At e Ah _ £ ^ 

1 ^ • 
h h h 


Finally, from (5.43) we easily obtain the equality 


gAh J? / y 1 \ 

lim = lim h~ l ( (Ah) + -(Ah) 2 + • ■ ■ + ~(Ah) k + •■■) = A. 

h — ^ o h h — > o V 2! k\ 1 


Ail these considérations hâve numerous applications in the theory of differential 
équations. Let us consider a System of n linear homogeneous differential équations 


dx[ 

dt 


n 

J2 ai J X J’ 

7=1 



(5.47) 


where are certain constant coefficients and Xj — Xj ( t ) are unknown différentiable 
functions of the variable t. Similarly to what was done earlier for Systems of linear 
algebraic équations (Example 2.49, p. 62), the System of linear differential équa- 
tions (5.47) can also be written down compactly in matrix form if we introduce the 
column vectors 


x — 



( dx\/dt\ 
dx 

dt 

\dx n /dt J 


and a square matrix of order n consisting of the coefficients of the System: A — (ai;). 
Then System (5.47) can be written in the form 


dx 



(5.48) 


The number n is called the order of this System. 

For any constant vector xo, let us consider the vector x(t) — e At xo , depending on 
the variable t. This vector satisfies the System (5.48). Indeed, for arbitrary matrices 
A(t) and B (possibly rectangular, provided that the number of columns of A(t) 
coincides with the number of rows of B ), if only the matrix B is constant, one has 
the equality 

after which it remains to use relationship (5.46). Similarly, for arbitrary matrices 
A{t) and B , where B is constant and the number of columns of B coincides with 
the number of rows of A(t), we hâve the formula 


d 

dt 


(BA(t)) = B 


dA(t) 


dt 


(5.49) 
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Since with t — 0, the matrix e At equals E , the solution x(t) — e Al xq satisfies the 
initial condition x(0) = xo- But the uniqueness theorem proved in the theory of 
differential équations asserts that for a given xo, such a solution is unique. Thus we 
may obtain ail solutions of the System (5.48) in the form e At xq if we consider the 
vector xo not as fixed, but as taking ail possible values in a space of dimension n. 

Finally, it is also possible to obtain an explicit formula for the solutions. To this 
end, let us make a linear substitution of variables in the System of équations (5.48) 
according to the formula y — C _1 x, where C is a nonsingular constant square ma- 
trix of order n. Then taking into account relationships (5.49), (5.48), and x — Cy, 
we obtain 


c h = C~'— = C _1 Ax = (C _1 ÀCb. (5.50) 

dt dt 

Formula (5.50) shows that the matrix A of a System of linear differential équations 
under a linear replacement of variables changes according to the same law as the 
matrix of a linear transformation under a suitable change of basis. In accord with 
what we hâve done in previous sections, we may choose as C a matrix with whose 
help, the matrix A is converted to Jordan normal form. As a resuit, the System (5.48) 
can be rewritten in the form 


dy 

dt 



where the matrix A' — C 1 AC is in Jordan normal form. 
Let 


(5.51) 


A! — 


(M 

0 

••• 0 \ 



0 

a 2 

... o 



• 

• 

• _ ; 


(5.52) 

U 

0 

• ■ • A r J 



Then 

System (5.51) is 

decomposed into r 

Systems 


dyi 

dt 




r, 


and for each of these, we can express the solution in the form e Ait x^ and find the 
matrix e Ait from the relationship (5.35). Here f(x) — e xt , and consequently, 




n = T\ e 


kt 
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This implies that for blocks A/ of the form (5.31) of order m, formula (5.35) gives 
us 


/ 1 

t 



0 0 

1 0 

t 1 


0 0\ 
0 0 


(5.53) 


fin— 2 fin— 3 

(m—2)\ (m- 3 )! 

fin— 1 fin— 2 

V(m — 1)! ( m—2)\ 


fin— 3 

(m— 3)! 


1 0 
' 1 / 


This implies that the solutions of the System (5.48) can be decomposed into sériés 
whose lengths are equal to the orders of the Jordan blocks in the représentation 
(5.52), and for a block of order ra, ail solutions of the given sériés can be expressed 
as linear combinations (with constant coefficients) of the functions 



m — \Xt 

c • 


(5.54) 


It is easily verified that the collection of solutions of System (5.48) forms a vector 
space, where the addition of two vectors and multiplication of a vector by a scalar 
are defined just as were addition and multiplication by a scalar of the correspond- 
ing functions. The set of functions (5.54) forms a basis of the space of solutions 
of the System (5.48). In the theory of differential équations, such a set is called a 
fundamental System of solutions. 

In conclusion, let us say a few words about linear differential équations with real 
coefficients in the plane ( n — 2) (that is, assuming that in System (5.48), the matrix 
A and vector x are real). Here, we should distinguish four possibilities for the matrix 
A and roots of the polynomial | A — XE |: 

(a) The roots are real and distinct: (a and /3). 

(b) There is a multiple root a (necessarily real) and A — aE. 

(c) There is a multiple root a , but Af^aE. 

(d) The roots are complex conjugate: a + ib and a — ib (here i 2 — —1 and b 0 ). 

In each of these cases, the matrix A can be brought (by multiplication on the left 
by C -1 and on the right by C, where C is some nonsingular real matrix) into the 
following normal forms: 




(b) 


a 0 
0 a 







The solution x(t) of the associated differential équation is obtained in the form 
x(t ) = e At xo, where xo = (^ ) is the vector of the original data. Further, we can 
use formula (5.53), considering that the matrix A of the System has the normal form 
(a), (b), (c), or (d). Here in cases (a)-(c), we will obtain 


(e at c\\ 

\e pt C2) 


( e" f cA 


(a) x(f) = 


t 


(b) x(t) = 


(5.55) 
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c\e at 

c\te at + c 2 e at 


(5.56) 


In case (d), we obtain x(t ) = e At ( L c l 2 ), where A = (£ In Example 4.2 
(p. 134) we established that A is the matrix of a linear transformation of the plane 
C with complex variable z that multiplies z by the complex number a -b ib. This 
means, by the définition of the exponential function, that e Aî is the matrix of multi- 
plication of z by the complex number By Euler’s formula, 

e ( a+ib)t _ e at (Q 0S bt i sin bt) — p + iq, 


where p — e clt cos bt and q — e c " sin bt. Thus we obtain a linear transformation of 
the real plane C with complex variable z that multiplies each complex number z G C 
by the given complex number p + iq. As we saw in Example 4.2, the matrix of such 
a linear transformation has the form (4.2). Multiplying it by the column vector xo of 
the original data and substituting the expressions p — e clt cos bt and q — e at sin bt, 
we obtain our final formula: 







( ci cos bt — C2 sin bt 
c i sin bt -b C2 cos bt 


(5.57) 


The plane of variables (xi,X2) is called the phase plane of the System (5.48) 
for n — 2. Formulas (5.55)-(5.57) define (in parametric form) certain curves in the 
phase plane, where to each pair of values ci , C2 there corresponds in general a curve 
passing through the point (ci , C2) of the phase plane for î — 0 . These oriented curves 
(the orientation is given by the direction of motion corresponding to an increase in 
the parameter t) are called phase curves of System (5.48), and the collection of 
ail phase curves corresponding to ail possible values of ci, C2 is called the phase 
portrait of the System. Let us pose the following question: What does the phase 
portrait of the System (5.48) look like in cases (a)-(d)? 

First of ail, we note that among ail solutions x(t) there is always the constant 
x(t) = 0. It is obtained by substituting in formulas (5.55)-(5.57) the initial values 
ci = C 2 = 0. The phase curve corresponding to this solution is simply the point 
x\ = X 2 — 0. Constant solutions (and their corresponding phase curves, points in the 
phase plane) are called singular points or equilibrium points or fixed points of the 
differential équation. 2 Similarly, just as the study of a function usually begins with 
a search for its extreme points, so a study of a differential équation usually begins 
with a search for its singular points. 

Are there singular points of System (5.48) other than x\ — X2 — 0? Singular 
points are the constant solutions of a System of équations, and since the dérivative 
of a constant solution is identically equal to zéro (that is, the left-hand side of Sys- 
tem (5.48) is identically zéro), this means that the right-hand side of System (5.48) 
must also be identically equal to zéro. Therefore, singular points are precisely the 


2 This name cornes from the fact that if at some moment in time, a material point whose motion is 
described by System (5.48) is located at a singular point, then it will remain there forever. 
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solutions of the System of linear homogeneous équations Ax — 0 . If the matrix A is 
nonsingular, then the System Ax = 0 has no solutions other than the null solution, 
and therefore, System ( 5 . 48 ) has no singular points other than x\ — X2 = 0 . If the 
matrix A is singular and its rank is equal to 1 , then System ( 5 . 48 ) has an infinité 
number of singular points lying on a line in the phase plane. But in the case that the 
rank of the matrix A is equal to 0 , ail points of the phase plane are singular points. 

In the sequel, we will consider that the matrix A is nonsingular and examine what 
sorts of phase portraits they correspond to in the cases (a)-(d) presented above. In 
ail the figures, the x-axis corresponds to the variable xi, while the y-axis represents 
the variable X2 . 

(a) The roots a and fi are real and distinct. In this case, there are three possibili- 
tés: a and fi hâve different signs, both are négative, or both are positive. 

(a.l) If a and fi hâve different signs, then a singular point is called a saddle. For 
definiteness, let us assume that a < 0 and fi > O.To the initial value c\ 0 , C2 — 0 
there corresponds the solution x\ (t) = c\e at , X2 (t) = 0, passing through the point 
(ci , 0 ) at t = 0 . The associated phase curve is the horizontal ray x\ > 0 , X2 — 0 (if 
ci > 0) or x\ < 0, X2 = 0 (if ci < 0) such that the direction along the curve with 
increasing t is toward the singular point x\ = X2 = 0. 

Similarly, to the initial point ci = 0 , C2 7^ 0 corresponds the solution x\ (t) = 0 , 
X2 (t) = C2e ^' , passing through the point ( 0 , C2) at t — 0 . The associated phase curve 
is the vertical ray x\ — 0, X2 > 0 (if C2 > 0) or x\ — 0, X2 < 0 (if C2 < 0) such 
that the direction along the curve for increasing t is away from the singular point 
X\—X 2 — 0 . 

Thus there are two phase curves asymptotically approaching the singular point 
as t —> +00 (they are called stable séparatrices ), and two curves approaching it 
for t — > —00 (they are called unstable séparatrices). Let us make one crucial ob- 
servation: from the fact that e 011 -> 0 for t —> +00 and e ^ -> 0 for t -> —00, it 
follows that stable and unstable séparatrices approach a saddle arbitrarily closely as 
t +00 and t — 00 respectively but never reach it in fini te time. 

The stable and unstable séparatrices of a saddle partition the phase plane into 
four sectors. In our case (in which the matrix of System ( 5 . 48 ) is in Jordan form), the 
séparatrices lie on the coordinate axes, and therefore, these sectors coincide with the 
Cartesian quadrants. Let us see how the remaining phase curves behave with respect 
to the initial values ci 7^ 0 , C2 7^ 0 . We observe first that if the initial point (ci , C2) 
lies in any of the four sectors, then after passing through it for t — 0, the phase curve 
remains in that sector for ail values of t. This follows obviously from the fact that 
the functions x\ ( t ) = c\e al and X2 (t) — C2e ^ are of fixed sign. 

For definiteness, let us consider the first quadrant c\ > 0 , C2 > 0 (the other cases 
can be obtained from this one by a symmetry transformation with respect to the x- 
or y-axis or with respect to the origin). Let us raise the function x\ (t) — C[e at to the 
fi power, and the function X2 (t) — C2e ^ to the a power. After dividing one by the 
other and canceling the factor e a ^ f , we obtain the relationship 
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saddle 




Fig. 5.1 Saddle and nodes 


where the constant c is determined by the initial values c\,C2- Since the numbers 
a and p hâve opposite signs, the phase curve in the plane (x\,X2) corresponding 
to this équation has a form similar to a hyperbola. This phase curve passes at some 
positive distance from the singular point x\ — X2 = 0 , asymptotically approaching 
one of the unstable séparatrices as t — ► +oo and to one of the stable séparatrices as 
t -> — oo. Such phase curves are said to be of hyperbolic or saddle type. 

Thus in the case of a saddle, we hâve two stable séparatrices approaching the 
singular point as t —> -foo and two unstable séparatrices approaching it as t — oo, 

and also an infinité number of saddle-type phase curves filling the four sectors into 
which the séparatrices divide the phase plane. The associated phase portrait is shown 
in Fig. 5 . 1 . 

(a. 2 ) If a and p hâve the same sign, then a singular point is called a node. More- 
over, if a and P are négative, then the node is said to be stable , while if a and P 
are positive, the node is unstable. The reason for this terminology will soon become 
clear. 

For definiteness, we will restrict our examination to stable nodes (unstable nodes 
are studied similarly), that is, we shall assume that the numbers a and P are négative. 
As in the case of a saddle, the phase curve corresponding to the initial value c\ ^ 0 , 
C 2 — 0 is the horizontal ray x\ > 0 , X 2 — 0 (if c\ > 0 ) or x\ < 0 , X 2 — 0 (if c\ < 0 ) 
such that the direction along the curve for increasing t is toward the singular point. 
The phase curve corresponding to the initial value c\ — 0 , C2 7^ 0 is the vertical ray 
x\ — 0, X2 > 0 (if C2 > 0 ) or x\ — 0, X2 <0 (if C2 < 0) such that the direction along 
the curve for increasing t is also toward the singular point. 

As in the case of a saddle, it is clear that if the initial point (c 1, C2) lies in one 
of the four quadrants, then the phase curve passing through it for t — 0 remains in 
that quadrant for ail values of t. Let us consider the first quadrant c\ > 0 , C2 > 0 . 
Proceeding as we did in the case of a saddle, we again obtain the équation ( 5 . 58 ). But 
now the numbers a and P hâve the same sign, and the phase curve corresponding 
to this équation has quite a different form from that in the case of a saddle. After 
a transformation of ( 5 . 58 ), we obtain the exponential function x\ — c x ^x ^ . If 
a > P, then the exponent a/ P is greater than 1 , and the graph of this function is 
similar to a branch of the parabola x\ — xy_. However, if a < p, then the exponent 
a/ p is less than 1, and the graph of the function looks like a branch of the parabola 
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Fig. 5.2 Dicritical and Jordan nodes 


X 2 = x\. Thus in the case of a stable node, ail the phase curves approach the singular 
point as t — > -|-oo, while for t — ► — oo, they move away from it (for an unstable node 
we must exchange the positions of +oo and — oo). Such phase curves are called 
parabolic. Phase portraits of stable and unstable nodes are depicted in Fig. 5.1. 

It is now possible to explain the terminology stable and unstable. If a material 
point was located at an equilibrium point that was a stable node and was brought 
out from that point by some external action, then moving along the curve depicted 
in the phase portrait, it will strive to return to that position. But if it was an unstable 
node, then a material point brought out from an equilibrium point not only would 
not strive to return to that position, but on the contrary, it would move away from it 
with exponentially increasing speed. 

(b) If a matrix A is similar to the matrix aE , then a singular point is called a 
dicritical node or bicritical node. Proceeding in the same way as before, we obtain 
the relationship (5.58) with /3 = a, from which follows the équation x\/x 2 — c\/c 2 - 
Ail the phase curves are ray s with origin at x\ — X 2 — 0. Moreover, if a < 0, then 
motion along them as t -> -foo proceeds toward the equilibrium point x\ — X 2 — 0, 
while if a > 0 , then away from it. Thus in the case a < 0 (a > 0), we hâve a stable 
(unstable) dicritical node. The phase portrait of a stable dicritical node is depicted 
in Fig. 5.2. In the case of an unstable dicritical node, it is necessary only to change 
the directions of the arrows to their opposite. 

(c) If the solution to the équation is given by formula (5.56), then a singular point 
is called a Jordan node. If a < 0, then the Jordan node is stable, and if a > 0, then 
it is unstable. For c\ ^ 0, C 2 — 0, we obtain two phase curves, namely the horizon- 
tal ray s x\ > 0 , X 2 — 0 and x\ < 0 , X 2 — 0 , whose motion is in the direction of the 
singular point for a < 0 and away from the singular point for a > 0. In the inves- 
tigation of phase curves for C 2 7 ^ 0 , one must study the properties of the functions 
x\(t) = c\e at and X 2 (t) = (c\t -h C 2 )e c/t for c\ > 0 and for c\ < 0. As a resuit, for a 
stable (unstable) Jordan node, one obtains the phase portrait depicted in Fig. 5.2. Ail 
the phase curves (except the two vertical rays) look like pièces of a parabola, each 
of which lies entirely either in the right or left half-plane and intersects the v-axis in 
a single point. 

(d) The roots are complex conjugates: a -h i b and a — ib, where b ^ 0. Here it is 
necessary to consider two cases: a ^ 0 and a — 0 . 
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Fig. 5.3 Foci and center 


(d.l) If a ^ 0, then a singular point is called a focus. In order to visualize the 
behavior of phase curves given by formula (5.57), we observe that the vector x(t) is 
obtained from the vector xq with coordinates (ci , C 2 ) by rotating it through the angle 
bt and multiplying by e at . Therefore, the phase curves are spirals that “wind” around 
the singular point jci = X 2 — 0 as t —> +00 (if a < 0) or as t —> —00 (if a > 0). For 
a < 0 and a > 0, a focus is said to be stable or unstable respectively. The direction 
of motion along the spirals (clockwise or counterclockwise) is determined by the 
sign of the number b. In Fig. 5.3 are shown phase portraits of a stable focus ( a < 0) 
and an unstable focus ( a > 0) in the case b > 0, that is, the case in which the motion 
along the spirals is counterclockwise. 

(d.2) If a — 0, then the singular point x\ — X 2 — 0 is called a center. Relationship 
(5.57) defines in this case a rotation of the vector *0 through the angle bt. The 
phase curves are concentric circles with common center x\ — X 2 = 0 along which 
the motion is either clockwise or counterclockwise according to the sign of the 
number b. The phase portrait of a center (for the case b > 0) is shown in Fig. 5.3. 


Chapter 6 

Quadratic and Bilinear Forms 


6.1 Basic Définitions 


Définition 6.1 A quadratic form in n variables x\,...,x n is a homogeneous 
second-degree polynomial in these variables. Therefore, only terms of degree two 
enter into this polynomial; that is, the terms are monomials of the form cpijXiXj for 
ail possible values of /, j — 1 , . . . , rc, and so the polynomial has the form 


• ..,X n 


n 


)=E 


(PijXiXj 


i,j = 1 


( 6 . 1 ) 


We note that in expression (6.1), there are like terms, such as XiXj = XjXi. We 
shall décidé later how to deal with them. 

Of course, every quadratic form (6.1) can be viewed as a function of the vector 

x = x\e\ H 1- x n e n , where e \ , . . . , e n is some fixed basis of the vector space L of 

degree n . We shall write this as 

n 

f(x) = E (pijXiXj. (6.2) 

ij = i 

The given définition of quadratic form obviously is compatible with the more 
general définition of form of arbitrary degree given in Sect. 3.8 (see p. 127). We 
recall that in that section, a form of degree k was defined as a function F (x) of the 
vector x e L, where F (x) is written as a homogeneous polynomial of degree k in 
coordinates x \ , . . . , x n in some (and hence any) basis of this vector space. Thus for 
k — 2, we obtain the above définition of quadratic form. 

By a change in coordinates, that is, by a choice of another basis of the space L, a 
quadratic form i/s(x) will be written as previously in the form (6.2) with some other 
coordinates (pij . 

Quadratic forms hâve the property of being very similar to linear functions, and in 
the sequel, we shall unité the theory of quadratic forms with that of linear functions 
and transformations. The following notion will serve as a foundation for this. 
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Définition 6.2 A function <p(x, y) that assigns to two vectors x, je La scalar value 
is called a bilinear form on L if it is linear in each of its arguments, that is, if for 
every fixed jeL, the function <p(x, y) as a function of x is linear on L and for each 
fixed ïeL, the function <p(x, y) as a function of y is linear on L. 

In other words, the following conditions must be satisfied for ail vectors of the 
space L and scalars a : 


<p(x 1 +X 2 , j) = <p(xi, j) + <K* 2 , J), 
cp(ax, y) = onp(x, y), 

(6.3) 

<p(x, ji + y 2 ) = <p(x, Ji) + <p(x, y 2 ). 

<p(x,ay) = a<p(x,y). 

If the space L consists of rows, we hâve a spécial case of the notion of mulîilinear 
function, which was introduced in Sect. 2.7 (for m — 2). 

If e \ , . . . , e n is some basis of L, then we can write 

x = x\e\ + • • • + x n e n , y = y \e\ + h y n e n > 

and using équations (6.3), we obtain a formula that expresses (in the chosen basis) 
the bilinear form <p(x, y) in terms of the coordinates of the vectors x and y : 

n 

<P(X, y) = ^2 VijXiy.h where <Pij = <p(ei,ej). (6.4) 

ÎJ = 1 

In this case, the square matrix 0 = (<pij) is called the matrix of the bilinear form cp 
in the basis e\ , . . . , e n . In the case that x and y are rows, this formulation represents 
a spécial way of writing an arbitrary multilinear function as introduced in Sect. 2.7 
(Theorem 2.29). 

The relationship (6.4) shows that the value of <p(x, y) can be expressed in terms 
of the éléments of the matrix 0 and the coordinates of the vectors x and y in the 
basis ei, , e n , which means that a bilinear form, as a function of the arguments x 
and y, is completely defined by its matrix 0 . This same formula shows that if we 
replace the argument y in the bilinear form cp(x, y) by x, where x = (x \, . . . , x n ), 
we obtain the quadratic form i//(x) = <p( x,x), and moreover, any quadratic form 
(6.1) can be obtained in this way; to do so, we need only choose a bilinear form 
(p(x, y) with matrix 0 = (ypij) satisfying the condition <p(ei, e f) — ( pij , where <p\j 
are the coefficients from (6.1). 

It is easily seen that the set of bilinear forms on a vector space L is itself a vector 
space if we define on it in a natural way the operations of addition of bilinear forms 
and multiplication by a scalar. Clearly, the null vector in such a space is the bilinear 
form that is identically equal to zéro. 

The connection between the notion of bilinear form and that of linear transfor- 
mation is based on the following resuit, which uses the notion of dual space. 
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Theorem 6.3 There is an isomorphism between the space of bilinear forms <p on 
the vector space L and the space £(L, L*) oflinear transformations «A : L L*. 

P roof Let <p(x , y) be a bilinear form on L. Let us associate with it the linear transfor- 
mation A : L —> L* as follows. By définition, A should assign to a vector y g La lin- 
ear function f(x) on L. We shall make this assignment by setting f(x) = <p(x, y). 
The vérification that the transformation A thus defined is linear is trivial. 

It is equally trivial to verify that the correspondence <p i-> A is a bijection. We 
shall simply point out the inverse transformation of the set £(L, L*) into the set of 
bilinear forms. Let «A be a linear transformation from L to L* that to each vector 
x g L assigns the linear function e>4>(x) g L*. This function takes the value e A(x)(y) 
on the vector y, which we shall dénoté by cp(x, y). Using the notation established in 
Sect. 3.7 (p. 125) and keeping in mind that in this situation, M = L*, we may write 
(p(x, y) = (x, A (y)) for arbitrary vector s x, y e L. 

Finally, it is completely obvious that the constructed mapping (p i-> A is an iso- 
morphism of vector spaces, that is, it satisfies the conditions cp\ + <p2 ^ A\ + A2 
and Xcp m* XA, where cpi i-> Ai and X is an arbitrary scalar. □ 

It follows from this theorem that the study of bilinear forms is analogous to that 
of linear transformations L —> L (although somewhat simpler). In mathematics and 
physics, a spécial rôle is played by two particular types of bilinear form. 


Définition 6.4 A bilinear form cp(x, y) is said to be symmetric if 



<p(x,y) = <p(y,x). 

(6.5) 

and antisymmetric if 

for ail vectors x, y g L. 

q>(x,y) = -(p(y,x), 

(6.6) 


We encountered spécial cases of both these concepts in Chap. 2, when the vectors 
x and y were taken to be rows of numbers. 

If following Theorem 6.3, we express the bilinear form cp(x, y) in the form 

<P(X, y) = (*,<> 4 . 00 ) (6-7) 

with some linear transformation A : L -> L*, then the symmetry condition (6.5) 
indicates that (x, <A(y)) = (y, A (x)). Silice (y, A(x)) — (x, o4>* (y)), where A * : 
L** -> L* is the linear transformation dual to A (see p. 125), then it can be rewritten 
in the form (x, <A(y)) = (x, <A*(y)). Since this relationship must be satisfied for ail 
vectors x, y g L, it can be rewritten in the form A = A*. Note that in view of the 
equality L** = L, both A and A * are transformations from L to L*. Similarly, the 
asymmetry condition (6.6) of the bilinear form cp(x, y) can be written in the form 
A = — A*. 
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Let us note that it suffices to verify the symmetry condition (6.5) and antisymme- 
try condition (6.6) for vectors x and y belonging to some particular basis e\, ... ,e n 
of the space L. Indeed, if this condition is satisfied for vectors in the basis e \ , . . . , e n , 
that is, for example, in the case of symmetry, the équations cp(ei , e j) — (p(e j, et) are 
satisfied for ail /, j = 1, . . . , n, then from formula (6.4), it follows that the condition 
(6.5) is met for ail vectors x, y e L. Recalling the définition of a matrix of a bilinear 
form, we see that the form p is symmetric if and only if its matrix 0 is symmetric 
in some basis of the space L (that is, 0 = 0*). Similarly, the antisymmetry of the 
bilinear form (p is équivalent to the antisymmetry of 0 in some basis (0 = —0*). 

The matrix 0 of a bilinear form dépends on the basis e \ , . . . , e n . We shall now 
investigate this dependence. Here, we shall use the formula (3.38) for changing 
coordinates that we derived in Sect. 3.4, and moreover, our reasoning will be similar 
to what we used then in deriving this formula. 

First of ail, let us write down the relationship (6.4) in a more compact matrix 
form. To this end, we observe that for 


y\\ 


rows x = (xi , . . . , x n ) and columns [y] — 


ynj 


the sum in formula (6.4) can be rewritten in the following form: 


n 


n / n 


n 


n 


J2 ^U x iyj = J2 Xi \J2 (Pi jyj) = Jl XiZh where zi = ^ Vifij 
ij= 1 i= 1 V=1 / /-I 7=1 


B y the rule of matrix multiplication, we obtain the expression 


n 


53 VijXiyj — x[z], where [z] = 

i,j = 1 


(z. 

\ | =&[y] 

\ Z ". 


This means that we now hâve 


n 

53 <PijXiyj =x<P[y]. 
ij = 1 

Let us note that by similar arguments, or by simply taking the transpose of both 
sides of the previous equality (on the left-hand side of which stands a scalar, that is, 
a matrix of type (1, 1), which is invariant under the transpose operation), we obtain 
a similar relationship 

n 

53 VijXiyj = y<P*[x]. 
ij = 1 
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Thus if in some basis e\, . . . ,e n , the matrix of the bilinear form cp is equal to 0, 
while the vectors x and y hâve coordinates x; and y;, then we hâve the following 
formula: 


<p(x,y) = x<P[y]. (6.8) 

Similarly, for another basis e\, ... , e' n , we obtain the equality 

(p(x,y) = x'®'[y'], (6.9) 

where 0 r is the matrix of the bilinear form (p, while x' and y- are the coordinates of 
the vectors x and y in the basis e \ , . . . , e' n . 

Let C be the transition matrix from the basis e\ , . . . , e' n to the basis e \ , . . . , e n . 
Then by the substitution formula (3.36), we obtain the relationships x = x'C* and 
[y] — C[y']. Substituting these expressions into (6.8), taking into account formula 
(6.9), we obtain the identity 


x'C*&C[y']=x'&'[y'], 

which is satisfied for ail x' and [y f ]. From this, it follows that the matrices 0 and 
0' of the bilinear form cp in these bases are related by the equality 


0' = C*0C. (6.10) 

This is the substitution formula for the matrix of a bilinear form for a change of 
basis. 

Since the rank of a matrix is invariant under multiplication on the left or right 
by a nonsingular square matrix of appropriate order (Theorem 2.63), it follows that 
the rank of the matrix 0 is the same as that of the matrix 0' for any transition 
matrix C. Thus the rank r of the matrix of a bilinear form does not dépend on the 
basis in which the matrix is written, and consequently, we may call it simply the rank 
of the bilinear form cp. In particular, if r = n, that is, if the rank coincides with the 
dimension of the vector space L, then the bilinear form <p is said to be nonsingular. 

The rank of a bilinear form can be defined in another way. B y Theorem 6.3, to 
every bilinear form (p there corresponds a unique linear transformation A : L -> L* , 
and the connection between the two is laid out in (6.7). It is easily verified that if 
we choose in the spaces L and L* two dual bases, then the matrices of the bilinear 
form <p and the linear transformation A will coincide. This shows that the rank 
of the bilinear form is the same as the rank of the linear transformation A. From 
this we dérivé that in particular, the form (p is nonsingular if and only if the linear 
transformation A : L — ► L* is an isomorphism. 

A given quadratic form f can be obtained from different bilinear forms (p\ this 
is related to the presence of similar terms in the expression (6.1) for a quadratic 
form, about which we spoke above. In order to obtain uniqueness and agreement 
with the properties of linearity, we shall proceed not as in secondary school, where, 
for example, one writes the sum of terms < 212 * 1*2 + < 221 * 2*1 = (<212 + < 221 )* 1 * 2 » but 
instead using a notation in which we do not collect like terms. 
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Remark 6.5 (On éléments of fields) Additional refinements in this section are di- 
rected at the reader who is interested in the case of vector spaces over an arbitrary 
field HL Here we shall introduce a certain limitation that will allow us to provide 
a single account for the cases K = R, K = C, and ail types of fields that we will 
be concerned with. Namely, in what follows we shall assume that K is a field of 
characteristic different from 2. 1 (We mentioned a similar limitation in the general 
concept of field on p. 83.) Using the simplest properties that can be derived from 
the définition of a field, it is easy to prove that in a field of characteristic different 
from 2, there exists for an arbitrary element a a unique element b such that 2b — a 
(where 2b dénotés the sum b + b). We then set b — a/2, and so whenever a — 0, it 
follows that b — 0. 

Theorem 6.6 Every quadratic form fi(x) on the space L over a field K of charac- 
teristic different from 2 can be represented in the form 

fi(x) = (p(x,x), (6.11) 

where <p is a symmetric bilinear form, and more over, for the given quadratic form 
fi, the bilinear form p is unique. 

P roof By what we hâve said above, an arbitrary quadratic form fi(x) can be repre- 
sented in the form 


Ÿ(x) = <pi(x,x), 


(6.12) 


where cp\ (x, y) is some bilinear form, not necessarily symmetric. Let us set 


<p(x, y) = 


<Pi(x,y) + <pi(y,x) 
2 


It is clear that cp(x, y) is a bilinear form, and moreover, it is already symmetric. 
From formula (6.12) follows the relationship (6.1 1), as asserted. 

We shall now prove that if relationship (6.1 1) holds for some symmetric bilinear 
form (p(x, y), then <p(x, y) is uniquely determined by the quadratic form fi(x). To 
see this, let us calculate fi(x + y). By assumption and the properties of the bilinear 
form (p , we hâve 


t (* + y) = <p(x + y, x + y) = <p(x , x) + <p(y, y) + <p(x, y) + <p(y, x). (6.13) 

In view of the symmetry of the form cp, we hâve 

fi(x + y) = fi(x) + fi (y) + 2 <p(x, y), 


fields of characteristic different from 2 are what are most frequently encountered. However, fields 
of characteristic 2, which we are excluding from considération here, hâve important applications, 
for example in discrète mathematics and cryptography. 
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which implies that 

<p(x,y) = + (6.14) 

This last relationship uniquely détermines a bilinear form <p(x, y) associated with 
the given quadratic form f(x). □ 

With the same assumptions, we hâve the following resuit for antisymmetric 
forms. 

Theorem 6.7 For every antisymmetric bilinear form cp(x, y) on the space L over a 
field K of characteristic different from 2, we hâve 

(p(x,x) — 0. (6.15) 

Conversely , if equality (6.15) is satisfiecl for every vector x e L, then the bilinear 
form (p(x, y) is antisymmetric. 

P roof If the form <p(x,y ) is antisymmetric, then transposing the arguments in 
the expression cp( x,x) leads to the relationship cp(x,x) = —(p(x,x), and then 
2 <p(x,x) = 0, from which follows equality (6.15), since by the condition of the 
theorem, the field K has characteristic different from 2. Conversely, if <p(x, x) = 0 
for every vector x g L, then this holds in particular for the vector x + y, that is, we 
obtain 


<p(x + y, x + y) = <p(x, x) + <p(x, y) + <p(y, x) + <p(y, y) = 0. 

Since we hâve cp(x,x) = (p (y, y) = 0 by the hypothesis of the theorem, it follows 
that <p(x, y) + (p(y, x) = 0 , which yields that the bilinear form <p(x, y) is antisym- 
metric. □ 

Let us note that the way of writing the quadratic form f(x) in the form (6.11) 
established by Theorem 6.6, where <p(x, y) is a symmetric bilinear form, shows us 
how to write similar terms in the représentation (6.1). Indeed, if we hâve 


x — x\e\ + • • • + x n e n , y — y \e\ + h 

and cp(x, y) is a bilinear form, then 

n 

<p(x,y) = XI WjXiyj' 

i>j= 1 

where (pij — (p(et , e j). The symmetry of the form cp(x, y) implies that cpij = cpji for 
ail /, j = 1, . . . , n. Then the représentation 

n 

f(x 1 , ...,*„) = X VijXiXj 

hj = 1 
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contains like terms cpijXiXj and cpjiXjXi for i ^ j . Then if i ^ j , the term with X[Xj 
occurs in the sum twice: as (p{jX{Xj and as ipj[XjX[. Since cpij — (pp , then collecting 
like terms leads to this sum being written in the form 2 ip>[jX[Xj. 

For example, the coefficients of the quadratic form x\ + x\x^ 4- x| are given 

by (pu = 1, (p 22 = 1, and <^12 = (P 21 = Such a way of writing things may seem 
strange at first glance, but as we shall soon see, it offers many advantages. 


6.2 Réduction to Canonical Form 

The main goal of this section is to transform quadratic forms into the simplest pos- 
sible form, called canonical. As in the case of the matrix of a linear transformation, 
canonical form is obtained by the sélection of a spécial basis of the given vector 
space. Namely, the required basis must possess the property that the matrix of the 
symmetric bilinear form corresponding to the given quadratic form assumes diag- 
onal form in that basis. This property is directly connected to the important notion 
of orthogonality, which will be used repeatedly in this and subséquent chapters. We 
note that the notion of orthogonality can be formulated in a way that is well defined 
for bilinear forms that are not necessarily symmetric, but it can be most simply 
defined for symmetric and antisymmetric bilinear forms. In this section, we shall 
consider only symmetric bilinear forms. 

Thus let there be given on the finite-dimensional vector space L a symmetric 
bilinear form (p(x, y). 

Définition 6.8 Vectors x and y are said to be orthogonal if <p(x, y) = 0. 

We observe that in light of the symmetry condition <p(y,x) = cp(x, y), the equal- 
ity <p(x, y) = 0 is équivalent to cp(y,x) =0. This is true as well for antisymmetric 
bilinear forms. However, if we do not impose a symmetry or antisymmetry condi- 
tion on the bilinear form, then the vector x can be orthogonal to the vector y without 
y being orthogonal to x. This leads to the concepts of left and right orthogonality 
and some very beautiful geometry, but it would take us beyond the scope of this 
book. A vector x g L is said to be orthogonal to a subspace L'cL relative to <p if it 
is orthogonal to every vector y e L' , that is, if cp(x, y) = 0 for ail y e L' . 

It follows at once from the définition of bilinearity that the collection of ail vec- 
tors x orthogonal to a subspace L' with respect to a given bilinear form <p is itself a 
subspace of L. It is called the orthogonal complément of the subspace L' with respect 
to the form (p and is denoted by (L')^. 

In particular, for \J — L, the subspace (L)^ represents the totality of vectors x g L 
for which the équation cp(x, y) = 0 is satisfied for ail y g L. This subspace is called 
the radical of the bilinear form cp(x, y). From the définition of a bilinear form, it 
follows at once that the radical consists of ail vectors x g L such that 


(p(x,ej) = 0 for ail i = 1, . . . , n, 


(6.16) 
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where e \, . . . , e n is some basis of the space L. The equalities (6.16) are linear ho- 
mogeneous équations that define the radical as a subspace of L. If we write down 

the vector x in the chosen basis, that is, in the form x — x\e\ + h x„e n , then in 

view of formula (6.4), we obtain from the equalities (6.16) a System of linear homo- 
geneous équations in the unknowns x \, . . . , x n . The matrix of this System coincides 
with the matrix 0 of the bilinear form <p in the basis e \, . . . , e n . Thus the space 
(L)^ satisfies the conditions of Example 3.65 from Sect. 3.5 (p. 1 14). Consequently, 

dim(L)^ —n — r, where r is the rank of the matrix of the linear System, that is, the 
rank of the bilinear form cp. We therefore obtain the equality 

r = dimL — dim(L)^. (6.17) 

Theorem 6.9 Let L' C L be a subspace such that the restriction of the bilinear form 
(p(x, y) to L ' is a nonsingular bilinear form. We then hâve the décomposition 

L=L'©(L')^. (6.18) 

P roof First of ail, we note that by the conditions of the theorem, the intersection 
L' Pi (L')^- is equal to the zéro space (0). Indeed, it consists of ail vectors x e L' 
such that cp(x, y) = 0 for ail y e L', and hence only for the null vector, since by the 
condition, the restriction of cp to the subspace L' is a nonsingular bilinear form. Thus 
it suffices to prove that L' + (L')^ = L. We shall présent two proofs of this fact in 
order to demonstrate two different lines of reasoning used in the theory of vector 
spaces. 

First proof. We shall use the linear transformation A : L —> L* constructed in 
Theorem 6.3 corresponding to the bilinear form cp. Assigning to each linear function 
on L its restriction to the subspace L' C L, we obtain the linear transformation £ : 
L* -> (L')*. If we apply in sequence the linear transformations A and <32, we obtain 
the linear transformation C = Û3A : L — ► (L/)*. 

The kernel l_i of the transformation C consists of the vectors y e L such that 
(p(x, y) = 0 for ail x e L r , since by définition, cp(x, y) = ( x , A>(y)). This implies 
that l_i = (L')^. Let us show that the image l _2 of the transformation C is equal to 
the entire subspace (L')*. We shall prove an even stronger resuit: an arbitrary vector 
u g (LO* can be represented in the form u — 0?(v), where v el! . For this, we must 
consider the restriction of the transformation C to the subspace L'. By définition, 
it coincides with the transformation A' : L' —> (LO* constructed in Theorem 6.3, 
which corresponds to the restriction of the bilinear form (p to LO By assumption, the 
restriction of the form <p to L' is nonsingular, which implies that the transformation 
A' is an isomorphism. From this, it follows in particular that its image is the entire 
subspace (LO*. 

Now we shall make use of Theorem 3.72 and apply relationship (3.47) to the 
transformation C. We obtain dimLi + dimL 2 = dimL. Since L 2 = (LO*, it follows 
by Theorem 3.78 that dimL 2 = dimL'. Recalling also that Li = (L')^, we hâve fi- 
nally the equality 


dim(L')^ + dimL' = dimL. 


(6.19) 
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Since L' fl (L/)^ = (0), we conclude by Corollary 3.15 (p. 85) that L' + (L/)^ = 
12 0 (L')^. From Theorems 3.24, 3.38 and the relationship (6.19), it follows that 
L' © (L')£ = L. 

Second proof. We need to represent an arbitrary vector x e L in the form x = 
m + d, where u e L' and v e (L/)^ . This is clearly équivalent to the condition x — u e 

(L/)^, and therefore to the condition <p(x — u, y) — 0 for ail y e L' . Recalling the 
properties of a bilinear form, we see that it suffices that the last équation be satisfied 
for vectors y = et, i — 1, . . . , r, where e\, . . . , e r is some basis of the space L'. 
In view of the bilinearity of the form <p, our relationships can be written in the 
form 


<p(u,ei) = (p(x,ei) for ail i = 1, . . . , r. (6.20) 

We now represent the vector u as u — x\e\ + Y x r e r . Relationship (6.20) gives 

a System of r linear équations 


<p(ei,ei)x\ H \-<p(e r ,ei)x r = (p(x,ei), i = 1, ...,r, (6.21) 

with unknowns x \ , . . . , x r . The matrix of the System (6.21) has the form 

f <p{e\,e\) q>(e\,e r f' 

0 = : : 

\<p{e r ,e\) ••• <p(e r ,e r )J 

But it is easy to see that (P is the matrix of the restriction of the bilinear 
form <p to the subspace 12 written in the basis e\,...,e r . Since by assump- 
tion, such a form is nonsingular, its matrix is also nonsingular, and this implies 
that the System of équations (6.20) has a solution. In other words, we can find 
a vector u e L' satisfying ail the relationships (6.20), which proves our asser- 
tion. □ 

We shall now apply these ideas related to bilinear forms to the theory of quadratic 
forms. Our goal is to find a basis in which the matrix of a given quadratic form \jr(x) 
has the simplest form possible. 

Theorem 6.10 For every quadratic form f/(x), there exists a basis in which the 
form can be written as 

Ÿ(x) = X\x\ + • • • + hn x ni ( 6 . 22 ) 

where x \ , . . . , x n are the coordinates of the vector x in this basis. 

Proof Let <p(x, y) be a symmetric bilinear form associated with the quadratic form 
\f{x) by the formula (6.11). If ^j/{x) is identically equal to zéro, then the theorem 
clearly is true (for A.i = • • • = X n = 0). If the quadratic form i//(x) is not identically 
equal to zéro, then there exists a vector e\ such that ir(e\)^0, that is, cp(e \ , e\) 0. 

This implies that the restriction of the bilinear form (p to the subspace L' = {e\) is 
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nonsingular, and therefore, by Theorem 6.9, for the subspace L' = (ey) we hâve 
the décomposition (6.18), that is, L = (e\) ® (e\)y. Since dim(ei) = 1, then by 
Theorem 3.38, we obtain that dim(^i)^ = n — 1. 

Proceeding by induction, we may assume the theorem to hâve been proved for the 
space (e\ )^. Thus in this space there exists a basis e 2 , ... , e n such that cp(ej ,ej) — 0 
for ail i j, /, j > 2. Then in the basis e \ , . . . , e n of the space L, the quadratic form 
x) can be written as (6.22) for some X\, ... ,X n . □ 


We observe that one and the same quadratic form ^ can be of the form (6.22) in 
various bases, and in this case, the numbers X\, ... , X n might differ in various bases. 
For example, if in a one-dimensional space whose basis consists of one nonzero 
vector e, we define the quadratic form \fr by the relation \j/(xe) — v 2 , then in the 
basis consisting of the vector e' — Xe, X 0, it can be written as f/(xe ') — (Xx) 2 . 

If in a certain basis a quadratic form can be written as in (6.22), then we say that 
in that basis, it is in canonical form. Theorem 6.10 is called the theorem on reducing 
a quadratic form to canonical form. From what we hâve said above, it follows that 
reducing a quadratic form to canonical form is not unique. 

If in the basis e\, ... ,e n of the space L, the quadratic form \j/(x) has the form 
established in Theorem 6.10, then its matrix in this basis is equal to 



0 

~Xl 


\0 0 


0 \ 

0 

k) 


(6.23) 


It is clear that the rank of the matrix ^ is equal to the number of nonzero values 
among X i , . . . , X n . As we saw in the previous section, the rank of the matrix ^ (that 
is, the rank of the quadratic form f(x)) does not dépend on the choice of basis in 
which the matrix ^ is written. Therefore, this number is the same for every basis 
for which Theorem 6.10 holds. 

It is useful to write down the results we hâve obtained in matrix form. We may 
reformulate Theorem 6.10 using formula (6.10) obtained in the previous section for 
replacing the matrix of a bilinear form by a change in basis. 


Theorem 6.11 For an arbitrary symmetric matrix , there exists a nonsingular ma- 
trix C such that the matrix C*@C is diagonal. If we select a different matrix C , we 
may obtain different diagonal matrices C*0C, but the number of nonzero éléments 
on the main diagonal will always be the same. 


A completely analogous argument can be applied to the case of antisymmetric 
bilinear forms. The following theorem is an analogue of Theorem 6.10. 

Theorem 6.12 For every antisymmetric bilinear form cp(x, y), there exists a ba- 
sis ei, ... , e fi whose first 2 r vectors can be combined into pairs (e 2 i-\, €2 ;), i = 
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1 , . . . , r, such that 

(p(e 2 i-]. e 2 j ) = 1 , (pieu , eu-\) = -1 for ail i = . ,r, 

p(e\ , e j) = 0 |z — 7 1 > 1 or i > 2r or j > 2 r. 


TT/ws in the given basis , matrix ofthe bilinear form p takes theform 

(0 1 0 \ 

-1 0 

0 1 

••• ••• — 1 0 ••• ••• 


-1 0 •• 

0 


(6.24) 


\ 


0 



P roof This theorem is an exact parallel to Theorem 6.10. If p(x, y) = 0 for ail x 
and y, then the assertion of the theorem is obvious (for r = 0). However, if this is not 
the case, then there exist two vectors e\ and e 2 for which p(e \ , ef) — oc 0. Setting 
e\ — a ~ 1 e \ , we obtain that ip{e \ , ef) — 1 . The matrix of the form (p restricted to the 
subspace L' = (e\, ef) in the basis e\,e 2 has the form 



(6.25) 


and consequently, it is nonsingular. Then on the basis of Theorem 6.9, we obtain the 
décomposition L = L' ® (L r )^, where dim(L r )^ — n — 2, with n — dim L. Proceeding 
by induction, we may assume that the theorem has been proved for forms p defined 
on the space (L 7 )^. If 2 is such a basis of the space (L')^, the existence 

of which is asserted by Theorem 6.12, then it is obvious that ^1, ^2» /1» • • • » / w -2 
is the required basis of the original space L. □ 


The number n — 2r is equal to the dimension of the radical of the bilinear form p, 
and therefore, it is the same for ail bases in which the matrix of the bilinear form p 
is brought into the form (6.24). The rank of the matrix (6.25) is equal to 2, while the 
matrix (6.24) contains r such blocks 011 the main diagonal. Therefore, the rank of 
the matrix (6.24) is equal to 2r. Thus from Theorem 6.12, we obtain the following 
corollary. 

Corollary 6.13 The rank of an antisymmetric bilinear form is an even number. 


Let us now translate everything that we hâve proved for antisymmetric bilinear 
forms into the language of matrices. Here our assertions will be the same as for 
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symmetric matrices, and they are proved in exactly the same manner. We obtain that 
for an arbitrary antisymmetric matrix 0 , there exists a nonsingular matrix C such 
that the matrix 

0' = C*0C (6.26) 

has the form (6.24). 

Matrices 0 and 0' that are related by (6.26) for some nonsingular matrix C are 
said to be équivalent. The same term is applied to the quadratic forms associated 
with these matrices (for a particular choice of basis). 

It is easy to verify that the concept thus introduced is an équivalence relation 
on the set of square matrices of a given order or indeed on the set of quadratic 
forms. The reflexive property is obvious. It is necessary only to substitute the matrix 
C — E into formula (6.26). Multiplying both sides of equality (6.26) on the right by 
the matrix B — C~ [ and on the left by the matrix Z?*, taking into account the rela- 
tionship (C -1 )* = (C*) -1 , we obtain the equality 0 = B*0' B, which establishes 
the symmetric property. 

Finally, let us verify the property of transitivity. Suppose we are given the re- 
lationships (6.26) and 0" — D*0' D for some nonsingular matrices C and D. 
Then if we substitute the first of these into the second, we obtain the equality 
0" = D*C*0CZ). Setting B = CD and taking into account B * = Z)*C*, we ob- 
tain the equality 0" = B*0 B, which establishes the équivalence of the matrices 0 
and 0" . 

It is now possible to reformulate Theorems 6.10 and 6.12 in the following form. 

Theorem 6.14 Every symmetric matrix is équivalent to a diagonal matrix. 

Theorem 6.15 Every antisymmetric matrix 0 is équivalent to a matrix of the form 
(6.24), xvhere the number r is equal to one-half the rank ofthe matrix 0 . 

From Theorems 6.14 and 6.15, it follows that ail équivalent symmetric matrices 
and ail équivalent antisymmetric matrices hâve the same rank, and for antisymmetric 
matrices, équivalence is the same as the equality of their ranks, that is, two antisym- 
metric matrices of a given order are équivalent if and only if they hâve the same 
rank. 

Let us conclude with the observation that ail the concepts investigated in this sec- 
tion can be expressed in the language of bilinear forms given by Theorem 6.3. By 
this theorem, every bilinear form cp(x, y) on a vector space L can be written uniquely 
in the form cp(x, y) = ( x , .A (y)), where A : L —> L* is some linear transformation. 
As proved in Sect. 6.1, the symmetry of the form cp is équivalent to A * = A, while 
antisymmetry is équivalent to A* = — A. In the first case, the transformation A is 
said to be symmetric , and in the second case, antisymmetric. Thus Theorems 6.10 
and 6. 12 are équivalent to the following assertions. For an arbitrary symmetric trans- 
formation A, there exists a basis of the vector space L in which the matrix of this 
transformation has the diagonal form (6.23). Similarly, for an arbitrary antisymmet- 
ric transformation A, there exists a basis of the space L in which the matrix of this 


204 


6 Quadratic and Bilinear Forms 


transformation has the form (6.24). More precisely, in both these statements, we are 
talking about the choice of basis in L and its dual basis in L* , since the transforma- 
tion A maps L to L* . 


6.3 Complex, Real, and Hermitian Forms 

We begin this section by examining a quadratic form in a complex vector space L. 
By Theorem 6.10, it can be written, in terms of some basis e \ , . . . , e n , in the form 

Ÿ(x) — X\x\ H h X n X„, 

where x\, ... ,x n are the coordinates of the vector x in this basis. This implies that 
for the associated symmetric bilinear form (p(x, y), it has the value (p(ei,e j) = 0 
for i ^ j and cp(ei, ei) — À/. Here, the number of values À, different from zéro is 
equal to the rank r of the bilinear form y. By changing the numération of the basis 
vectors if necessary, we may assume that À/ ^ 0 for i < r and À/ = 0 for i > r. We 
may then introduce a new basis e\, ... , e' n by setting 

e'f = y/xiei for i < r, e'j = e t for i > r, 

since y/Xj is again a complex number. In the new basis, < p(e'j , e'-) = 0 for ail i ^ j 

«/ 

and \ p (e ^ , e^) = 1 for i < r, (pie^, e^) = 0 for i > r. This implies that the quadratic 
form can be written in this basis in the form 

f{x)=x{-\ h ï r 2 , (6.27) 

where x\, ... ,x r are the first r coordinates of the vector x. We see, then, that in 
a complex space L, every quadratic form can be brought into the canonical form 
(6.27), and ail quadratic forms (and therefore also symmetric matrices) of a given 
rank are équivalent. 

We now consider the case of a real vector space L. By Theorem 6.10, an arbitrary 
quadratic form x/r can again be written in the form 


x//(x) — h\X f + • • • ~h À r x 

where ail the À/ are nonzero and r is the rank of the form xjr. But we cannot pro- 
ceed so simply as in the complex case by setting e' ï — s/Xei , since for À/ < 0, the 
number À, does not hâve a real square root. Therefore, we must consider separately 
among the numbers Ài , . . . , À r , those that are positive and those that are négative. 
Again changing the numération of the vectors of the basis as necessary, we may 
assume that X\, ... ,X S are positive, and that , . . . ,X r are négative. Now we can 
introduce a new basis by setting 

= t/x'î for i < s, e\ — >J—Xi for i = s + 1 , . . . , r, 


e] — ej for i > r. 
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In this basis, for a bilinear form cp, we hâve (p{e ' t , e'-) — 0 for i / j, and , e.) = 1 

for i = 1, . . . , (p(e'j ,e' i ) = —\ for i = s + 1, . . . , r, and the quadratic form fi will 
thus be brought into the form 

fi{x) — x\ + • • • + x^ — —••• — Xy . (6.28) 

Let us note one important spécial case. 

Définition 6.16 A real quadratic form fi(x) is said to be positive definite if fi(x) > 
0 for every and négative definite if fi(x) < 0 for every x/0. 

It is obvious that these notions are connected by a simple relationship: négative 
definite forms fi(x) are équivalent to positive definite forms —fi(x), and conversely. 
Therefore, in the sequel, it will suffice to establish the basic properties of positive 
definite forms only, and the corresponding properties of négative definite forms will 
be obtained automatically. 

Written in the form (6.28), a quadratic form on an n-dimensional vector space 
will be positive definite if s —n, and négative definite if s — 0 and r — n. 

The fundamental property of real quadratic forms is stated in the following theo- 
rem. 

Theorem 6.17 For every basis in terms of which the real quadratic form fi can be 
written in the form (6.28), the number s always has one and the same value. 

Proof Let us characterize s in a way that does not dépend on reducing the quadratic 
form fi to the form (6.28). Namely, let us prove that s is equal to the largest di- 
mension among subspaces L' C L such that the restriction of fi to L is a positive 
definite quadratic form. To this end, we note first of ail that for an arbitrary basis 
in which the form takes the form of (6.28), it is possible to find a subspace L of 
dimension s on which the restriction of the form fi gives a positive definite form. 
Namely, if the form fi(x) is written in the form (6.28) in the basis e \, . . . , e n , then 
we set L' = {e \, . . . , e s ). It is obvious that the restriction of the form fi to L gives a 
positive definite quadratic form. Similarly, we may consider the set of vectors L" for 
which in the décomposition (6.28), the first s coordinates are equal to zéro: x\ — 0, 

. . . , x s = 0. It is clear that this set is the vector subspace L" = (e s + \ , e s +2> • • • » e n), 
and for an arbitrary vector x e L", we hâve the inequality fi(x) <0. 

Let us suppose that there exists a subspace M c L of dimension m > s such that 
the restriction of fi to M gives a positive definite quadratic form. It is then obvious 
that dimM + dimL" — m-\-n— s>n. By Corollary 3.42, the subspaces M and 
L" must hâve a common vector x/0. But since x e L", it follows that fi(x) < 0, 
and since x g M, we hâve fi(x) > 0. This contradiction complétés the proof of the 
theorem. □ 

Définition 6.18 The number s from Theorem 6.17 that is the same no matter how 
a quadratic form is brought into the form (6.28) is called the index ofinertia of the 
quadratic form fi. In connection with this, Theorem 6.17 is often called the law of 
inertia. 
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Positive definite quadratic forms play an important rôle in the theory that we 
are expounding. B y the theory developed thus far, to establish whether a quadratic 
form is positive definite, it is necessary to reduce it to canonical form and verify 
whether the relationship s — n is satisfied. However, there is a feature that makes it 
possible to détermine positive definiteness from the matrix of the associated bilinear 
form written in an arbitrary basis. Suppose this matrix in the basis e \ , . . . , e n has the 
form 


0 = 0 ipij ) , where (p tj = (p (e,- , e j ) . 

The minor A[ of the matrix <£> at the intersection of the first i rows and first i 
columns is called a leading principal minor. 

Theorem 6.19 (Sylvester’s criterion) A quadratic form f is positive definite if and 
only if ail leading principal minor s of the matrix of the associated bilinear form are 
positive. 

P roof We shall show that if a quadratic form is positive definite, then ail the Ai 
are positive. We note as well that A n — \<P\ is the déterminant of the matrix of the 
form ip. In some basis, the form f is in canonical form, that is, its matrix in this 
basis has the form 


Ai 

0 

••• 0\ 

• O 

^2 

... 0 

• • O 

• • O 

’kn) 


Since the quadratic form \[r is positive definite, it follows that ail the À; are greater 
than 0, and clearly, \@'\ > 0. In view of formula (6.26) for replacing the matrix of a 
bilinear form by a change of basis along with the equality |C*| = |C|, we obtain the 
relationship \ <P'\ — \ 0\ • |C| 2 , from which it follows that A n — \ &\ > 0. Let us now 
consider the subspaces L / = (e \, . . . , ei) C L of dimension i > 1. The restriction 
of the quadratic form f(x) to L/ is clearly also a positive definite form. But the 
déterminant of its matrix in the basis e \, . . . , et is equal to Aj. Therefore, Aj > 0, 
as we hâve shown. 

Let us now show that conversely, from the condition Ai > 0 for ail i = 1, . . . , n, 
it follows that the quadratic form f is positive definite. We shall prove this by in- 
duction on the dimension n of the space L. 

It is clear that L / C L for i — 1 , ... ,n — 1, and the leading principal minors Aj 
in the basis e \ , . . . , e n of the matrix of the form f restricted to the subspace L / are 
the same as for the form (p in L. Therefore, the restriction of the quadratic form f to 
L n -\ may be assumed positive definite by the induction hypothesis. Consequently, 
the restriction cp(x, y) to the subspace L n -\ is a nonsingular bilinear form, and so by 
Theorem 6.9, we hâve the décomposition L = L n -\ ® (L„_i)^, where diml_ /7 _i = 

n — 1 and dim(L /7 _i)^ = 1. We may therefore express the vector e n in the form 


e n = fn + J’ where y e L »-l , fn e ( L »-l)^- 


(6.29) 
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We may represent an arbitrary vector x e L as a linear combination of vectors of the 

basis ei, , e n , that is, in the form x = x\e\ H F x n -\e n -\ +x n e n = u + x n e n , 

where u e L n -\. Substituting the expression (6.29) and setting u + x n y — v , we 
obtain 

x = v + x n f n , where u e L„_1, /„ e (6.30) 

This implies that the vectors v and f n are orthogonal with respect to the bilinear 
form <p, that is, cp(v, f n ) = 0, and therefore, from the décomposition (6.30), we 
dérivé the equality 

1 l/(x) = ir(v) + x;\lr(f n ). (6.31) 

We see, then, that in the basis e \, . . . , e n -\, f n , the matrix of the bilinear form cp 
takes the form 

/ 0 \ 


O' 

• 


0 


\0 ••• 0 *(/„)/ 

and for its déterminant D n , we obtain the expression D n — \<P'\ • ^r(/ w ). Since 

D n > 0 and \<P'\ > 0, it then follows that ÿXf n ) > 0. By the induction hypothe- 

sis, the term ÿXv) is positive in formula (6.31), and therefore, i j/(x) > 0 for every 

x / 0 . □ 

Example 6.20 Sylvester’s criterion has a beautiful application to the properties of 
algebraic équations. Consider a polynomial f(t) of degree n with real coefficients, 
about which we shall assume that its roots (real or complex) z \ , . . . , z n are distinct. 
For each root zk, we consider the linear form 

lk(x) = X\ + X2Zk + • • • + x n^k (6.32) 

and likewise the quadratic form 

n 

f(x) = ^2,ll(x i,...,x n ), (6.33) 

k= 1 


where x = (x \ , . . . , x n ). 

Although among the roots zk there may be some that are complex, the quadratic 
form (6.33) is always real. This is obvious for the terms l\ corresponding to the 
real roots z,k- Now, as regards the complex roots, it is well known that they corne 
in complex conjugate pairs. Let Zk and zj be complex conjugates of each other. 
Separating the coefficients lk of the linear form into real and imaginary parts, we 
can write it in the form lk = Uk + ivk, where Uk and Vk are linear forms with real 
coefficients. Then / j = Uk — ivk, and for this pair of complex conjugate roots, we 

hâve the sum + /? = 2 m ^ — 2u|, which is a real quadratic form. 
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Thus the quadratic form (6.33) is real, and we hâve the following important cri- 
terion. 

Theorem 6.21 Ail the roots of a polynomial f(t) are real if and only if the quadratic 
form (6.33) is positive definite. 

Proof If ail the roots Zk are real, then ail the linear forms lk of (6.32) are real, and 
the sum on the right-hand side of (6.33) contains only nonnegative ternis. It is clear 
that it is equal to zéro only if = 0 for ail k = 1, . . . , n. This condition gives us 
a System consisting of n linear homogeneous équations in n unknowns x \, . . . , x n . 
From formula (6.32), it is easy to see that the déterminant of the matrix of this 
System is known to us already as a Vandermonde déterminant; see formulas (2.32) 
and (2.33). It is different from zéro, since ail the roots zk are distinct, and hence this 
System has only the null solution. This implies that f(x) > 0 and f(x) — 0 if and 
only if x — 0, that is, the quadratic form (6.33) is positive definite. 

Let us now prove the converse assertion. Let the quadratic form (6.33) be positive 
definite, and suppose the polynomial f(t) has r real roots and p pairs of complex 
roots, so that r + 2p = n. Then as we hâve seen, 


where the first sum extends over ail real roots, and the second sum is over ail pairs 
of complex conjugate roots. 

Let us now show that if p > 0, then there exists a vector x 0 such that 

l { (x) = 0, ..., / r (x) = 0, wi(jt) = 0, ..., u p (x) — 0. 

These equalities represent a System of r + p linear homogeneous équations in n 
unknowns x \ , . . . , x n . Since the number of équations r + p is less than r + 2p — n, 
it follows that this System has a nontrivial solution, x = (xq, . . . , x n ), for which the 
quadratic form (6.34) takes the form 


and moreover, the equality f(x) = 0 is possible only if vj (x) = 0 for ail j = 
1, . . . , p. But then we obtain the equalities lk(x) = 0 in general for ail linear forms 
(6.32), which in view of the positive definiteness is possible only if x = 0. We hâve 
thus obtained a contradiction to the fact that p > 0, that is, that the polynomial f(t) 
has at least one complex root. 

The form (6.33) can be calculated explicitly, and then we can apply Sylvester’s 
criterion to it. To this end, we observe that the coefficient of the monomial x\ on 

the right-hand side of (6.33) is equal to S 2 (&- 1 ) — + h while the 


p 


p 



(6.34) 


p 


VK*) = -2^t>y <0, 

7=1 
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/ I * 2 

coefficient of the monomial jc ; xj (where i ^ j ) is equal to 2 s /+/_2 = 2{z\ J ~ + 

• • • + Zn' ] “). The sums sk = YA = î A are called Newton sums. It is known from 
the theory of symmetric functions that they can be expressed as polynomials in the 
coefficients of f(t). Thus the matrix of a symmetric bilinear form associated with a 
quadratic form (6.33) has the form 


•so 

■Si 

■■■ 

■Si 

s 2 

S n 

\Sj 7 — 1 

S n 

*•* S2n-2/ 


Applying Sylvester’s criterion to the form (6.33), we obtain the following resuit: ail 
(distinct) roots of the polynomial f(t ) are real if and only if the following inequality 
holds for ail i = 1 , . . . , n — 1 : 

so S\ ••• Si - 1 

S[ S 2 * * • Si 

Si - 1 Si ••• S2i -2 

To illustrate this assertion, let us consider the simplest case, n — 2. Let f(t) — 
t 2 + pt + q. Then for the roots of the polynomial f(t) to be real and distinct is 
équivalent to the following two inequalities: 


> 0 . 


□ 


so > 0 , 


so 

si 



s 2 


(6.35) 


The first of these is satisfied for every polynomial, since so is simply its degree. If 
the roots of the polynomial f(t) are a and /3, then 

so — 2, s\ — a + P = —p, s 2 = or + fi 2 = (a + P) 2 — 2a p = p 2 — 2q , 

and inequality (6.35) yields 2 (p 2 — 2 q) — p 2 — p 2 — 4q > 0. This is a criterion 
that one learns in secondary school: the roots of a quadratic trinomial are real and 
distinct if and only if the discriminant is positive. 

We return now to complex vector spaces and consider certain functions in them 
that are more natural analogues of bilinear and quadratic forms than those examined 
at the beginning of this section. 


Définition 6.22 A function f(x) defined on a complex vector space L and taking 
complex values is said to be semilinear if it possesses the following properties: 


/(* + j) = /« + /O0, 

f(ax) = âf(x), 


(6.36) 
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for arbitrary vectors x and y in the space L and complex scalar oc (here and below, 
ôc dénotés the complex conjugate of a). 

It is clear that for every choice of basis e \ , . . . , e n of the space L, a semilinear 
function can be written in the form 


f{x) — x\y H \-x n y n , 

where the vector x is equal to x\e\ + • • • + x n e n , and the scalars y* are equal to 
/(e;). 

Définition 6.23 A function cp(x, y) of two vectors in the complex vector space L is 
said to be sesquilinear if it is linear as a function of x for fixed y and semilinear as 
a function of y for fixed x. 

The terminology “sesquilinear” indicates the “full” linearity of the first argument 
and semilinearity of the second. Semilinear and sesquilinear functions are also fre- 
quently called forms. In the sequel, we shall also use such a désignation. 

It is obvious that for an arbitrary choice of basis e\, ... ,e n of the space L, a 
sesquilinear form can be written in the form 

n 

<p(x, y) = ^2 VijXiÿj' where Wj = > f/), (6.37) 

ij= 1 

and the vectors x and y are given by x = x\e\ + • • • + x n e n and y = y \e\ + • • • + 
y n e n . As in the case of a bilinear form, the matrix 0 = (<Pij) with éléments (p,j = 
cp(ei, e j) as defined above is called the matrix of the sesquilinear form cp(x, y) in 
the chosen basis. 

Définition 6.24 A sesquilinear form cp(x, y) is said to be Hermitian if 

<p(y, x) = cp{x, y) (6.38) 

for arbitrary choice of vectors x and y. 

It is obvious that in the expression (6.37), the Hermitian nature of the form 
cp(x, y) is expressed by the property cpjj = ÿTjJ of the coefficients <p\j of its ma- 

trix 0 , that is, by the relationship 0 = 0 . A matrix exhibiting these properties is 
also called Hermitian. 

After separating real and imaginary parts in cp(x, y), we obtain 

cp(x, y) = u (x , y) + iv(x, y), (6.39) 

where u(x, y) and v(x, y) are functions of two vectors x and y of the complex 
space L taking real values. In the space L, multiplication by a real scalar is also 
defined, and so it may be viewed as a real vector space. We shall dénoté this real 
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vector space by L^. Obviously, in the space L^, the functions u(x, y) and v(x, y) 
are bilinear, and the property of the complex form <p(x, y) being Hermitian implies 
that on I_ir, the bilinear form u(x, y) is symmetric, while v(x, y) is antisymmetric. 

Définition 6.25 A function f(x) on a complex vector space L is said to be 
quadratic Hermitian if it can be expressed in the form 

ir(x) — cp(x, x) (6.40) 


for some Hermitian form cp(x, y). 

From the définition of Hermitian form, it follows at once that the values of 
quadratic Hermitian functions are real. 

Theorem 6.26 A quadratic Hermitian function f{x) uniquely détermines a Her- 
mitian sesquilinear form <p(x, y) as presented in (6.40). 

P roof B y the définition of sesquilinearity, we hâve 

i/(x + y) = f(x) + i fr(y) + <p(x, y) + <p(x, y). (6.41) 

Substituting here the expression (6.39), we obtain that 

u{x, y) = fifix + y)-xfr(x)- t/r(y)). (6.42) 

Similarly, from the relationship 

f(x + iy) = f\x) + f(iy) + (p(x, iy ) + <p(iy, x) (6.43) 

we obtain by the properties of being Hermitian and sesquilinearity that 

cp(x, iy) = - i<p(x , y), <p(iy, x) = <p(x, iy), 

which yields 

V(x, y) = f^(x + iy) - 1 Hx)~ fiiy))- (6.44) 

The expressions (6.42) and (6.44) thus obtained complété the proof of the theo- 
rem. □ 

Theorem 6.27 A sesquilinear form cp(x, y) is Hermitian if and only if the function 
f(x) associated with it by relationship (6.40) assumes only real values. 

Proof If a sesquilinear form (p(x,y ) is Hermitian, then by définition (6.38), we 
hâve the equality cp{x,x) — cp(x, x) for ail x e L, from which it follows that for an 
arbitrary vector x g L, the value f(x) is a real number. 
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On the other hand, if the values of the function i jr{x) are real, then arguing just 
as we did in the proof of Theorem 6.26, we obtain from formula (6.41), taking into 
account (6.38), that the value 

+ y) - f(x) - ÿ{y) = <p(x, y) + <p(y , x) 

is real. Substituting here the expression (6.39), we see that the sum v(x, y) + v(y, x) 
is equal to zéro, that is, the function v(x, y) is antisymmetric. 

Reasoning similarly, from formula (6.43), we conclude that the value 

f(x + iy) - 1 fr(x) - 1 l/(iy) = <p(x, iy) + <p{iy, x) 

is also real. From the définition of semilinearity and sesquilinearity, we hâve the 
relationships cp(i y, x) = i(p(y, x) and <p(x , i y) = —i<p(x , y). We thereby obtain that 
the number 

i{<p(y,x)-<p(x,y)) 

is real, which by virtue of the expression (6.39) gives the equality u(y,x) — 
u(x,y) — 0; that is, the function u(x,y) is symmetric. Consequently, the form 
<p(x, y) is Hermitian. □ 

Hermitian forms are the most natural complex analogues of symmetric forms. 
They exhibit analogous properties to those that we derived for symmetric forms in 
real vector spaces (with completely analogous proofs), namely réduction to canon- 
ical form, the law of inertia, the notion of positive definiteness, and Sylvester’s cri- 
terion. 


Chapter 7 

Euclidean Spaces 


The notions entering into the définition of a vector space do not provide a way of 
formulating multidimensional analogues of the length of a vector, the angle between 
vectors, and volumes. Yet such concepts appear in many branches of mathematics 
and physics, and we shall study such concepts in this chapter. Ail the vector spaces 
that we shall consider here will be real (with the exception of certain spécial cases in 
which complex vector spaces will be considered as a means of study ing real spaces). 


7.1 The Définition of a Euclidean Space 

Définition 7.1 A Euclidean space is a real vector space on which is defined a fixed 
symmetric bilinear form whose associated quadratic form is positive definite. 

The vector space itself will be denoted as a rule by L, and the fixed symmetric 
bilinear form will be denoted by (x, y). Such an expression is also called the inner 
product of the vectors x and y. Let us now reformulate the définition of a Euclidean 
space using this terminology. 

A Euclidean space is a real vector space L in which to every pair of vectors x 
and y there corresponds a real number (x, y) such that the following conditions are 
satisfied: 

(1) (X\ -h x 2 , y) = (Ai, y) + (*2. y) for ail vectors xi, x 2 , y e L. 

(2) (ax, y) = o?(x, y) for ail vectors x, y e L and real number a. 

(3) (x, y) — (y, x) for ail vectors x, y e L. 

(4) (x, x) > 0 for x / 0. 

Properties (l)-(3) show that the function (x, y) is a symmetric bilinear form on 
L, and in particular, that (0, y) = 0 for every vector y g L. It is only property (4) that 
expresses the spécifie character of a Euclidean space. 

The expression (x, x) is frequently denoted by (x 2 ); it is called the scalar square 
of the vector x . Thus property (4) implies that the quadratic form corresponding to 
the bilinear form (x, y) is positive definite. 
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Let us point out some obvious conséquences of these définitions. For a fixed vec- 
tor y e L, where L is a Euclidean space, conditions (1) and (2) in the définition can 
be formulated in such a way that the function f y (x) — (x, y) with argument x is 
linear. Thus we hâve a mapping y i— >- f y of the vector space L to L*. Condition (4) 
in the définition of Euclidean space shows that the kernel of this mapping is equal 
to (0). Indeed, f y ^ 0 for every y ^ 0, since f y (y) = (y 2 ) > 0. If the dimension 
of the space L is finite, then by Theorems 3.68 and 3.78, this mapping is an iso- 
morphism. Moreover, we should note that in contrast to the construction used for 
proving Theorem 3.78, we hâve now constructed an isomorphism L L* without 
using the spécifie choice of a basis in L. Thus we hâve a certain natural isomor- 
phism L L* defined only by the imposition of an inner product on L. In view of 
this, in the case of a finite-dimensional Euclidean space L, we shall in what follows 
sometimes identify L and L*. In other words, as for any bilinear form, for the in- 
ner product (x, y) there exists a unique linear transformation A : L — ► L* such that 
(x, y) = e A(y)(x). The previous reasoning shows that in the case of a Euclidean 
space, the transformation A is an isomorphism, and in particular, the bilinear form 
(x, y) is nonsingular. Let us give some examples of Euclidean spaces. 

Example 7.2 The plane, in which for (x, y) is taken the well-known inner product 
of x and y as studied in analytic geometry, that is, the product of the vectors’ lengths 
and the cosine of the angle between them, is a Euclidean space. 

Example 7.3 The space W 1 consisting of rows (or columns) of length n, in which 
the inner product of rows x = (oq , . . . , ot n ) and y = (/3\ , . . . , /3 n ) is defined by the 
relation 


(x, j) = 0' 1 /3i +a 2 p 2 -\ (7.1) 

is a Euclidean space. 

Example 7.4 The vector space L consisting of polynomials of degree at most n 
with real coefficients, defined on some interval [ a , b\, is a Euclidean space. For two 
polynomials f(t) and g(t), their inner product is defined by the relation 

(/,*)= f h (7.2) 

J a 

Example 7.5 The vector space L consisting of ail real-valued continuous functions 
on the interval [a, b] is a Euclidean space. For two such functions f(t) and g(t), we 
shall define their inner product by equality (7.2). 

Example 7.5 shows that a Euclidean space, like a vector space, does not hâve to 
be finite-dimensional. 1 In the sequel, we shall be concerned exclusively with finite- 
dimensional Euclidean spaces, on which the inner product is sometimes called the 


1 Infinite-dimensional Euclidean spaces are usually called pre-Hilbert spaces. An especially impor- 

tant rôle in a number of branches of mathematics and physics is played by the so-called Hilbert 
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Fig. 7.1 Orthogonal 
projection 



scalar product (because the inner product of two vectors is a scalar) or dot product 
(because the notation x • y is frequently used instead of (x, y)). 

Example 7.6 Every subspace \1 of a Euclidean space L is itself a Euclidean space if 
we define on it the form (x, y) exactly as on the space L. 

In analogy with Example 7.2, we make the following définition. 

Définition 7.7 The length of a vector x in a Euclidean space is the nonnegative 
value yj (x 2 ). The length of a vector x is denoted by |x|. 

We note that we hâve here made essential use of property (4), by which the length 
of a nonnull vector is a positive number. 

Following the same analogy, it is natural to define the angle cp between two vec- 
tors x and y by the condition 


(x,y) „ ^ 

cos (p — , 0 < (p < TT. (7.3) 

\x\-\y\ 

However, such a number cp exists only if the expression on the right-hand side of 
equality (7.3) does not exceed 1 in absolute value. Such is indeed the case, and the 
proof of this fact will be our immédiate objective. 

Lemma 7.8 Given a vector e ^ 0, every vector x G L can be expressed in the form 

x=ae + y, (e,y) = 0, (7.4) 

for some scalar a and vector y G L; see Fig. 7.1. 

Proof Setting y — x — ae, we obtain a from the condition ( e , y) = 0. This is équiv- 
alent to (x , e) = a (e , e ) , which implies that a — (x,e)/\e\ 2 . We remark that \e\ ^ 0, 
since by assumption, e ^ 0. □ 


spaces , which are pre-Hilbert spaces that hâve the additional property of completeness , just for 
the case of infinité dimension. (Sometimes, in the définition of pre-Hilbert space, the condition 
(x, x) > 0 is replaced by the weaker condition (x, x) > 0.) 
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Définition 7.9 The vector ae from relation (7.4) is called the orthogonal projection 
of the vector x onto the line (e). 


Theorem 7.10 The length of the orthogonal projection of a vector x is at most its 
length | jc | . 

P roof Indeed, since by définition, x — ae + y and (e, y) = 0, it follows that 

\x\~ = (x 2 ) = (ae + y, ae + y) = \ae\ 2 + |y| 2 > \ae \ 2 , 
and this implies that 


x\ > \ae 


(7.5) 

□ 


This leads directly to the following necessary theorem. 

Theorem 7.11 For arbitrary vectors x and y in a Euclidean space , the following 
inequality holds : 

(*,j0| < (7-6) 

P roof If one of the vectors x, y is equal to zéro, then the inequality (7.6) is obvious, 
and is reduced to the equality 0 = 0. Now suppose that neither vector is the null 
vector. In this case, let us dénoté by a y the orthogonal projection of the vector 
x onto the line {y). Then by (7.4), we hâve the relationship x = a y + z, where 
(y, z) = 0. From this we obtain the equality 

(x, y) = (a y + z,y) = (a y, y) = a|j| 2 . 

This means that | (jc, y)| = |or| • |y | 2 = \ay\ • \y\. But by Theorem 7.10, we hâve 
the inequality \ay\ < 1*1, and consequently, |(*, y)\ < |*| • \y\. □ 


Inequality (7.6) goes by a number of names, but it is generally known as the 
Cauchy-Schwarz inequality. From it we can dérivé the well-known triangle inequal- 
ity from elementary geometry. Indeed, suppose that the vectors x — AB , y — BC , 

z — CA correspond to the sides of a triangle ABC. Then we hâve the relationship 
* + y + z — 0, from which with the help of (7.6) we obtain the inequality 


|z| 2 = (x + y,x + y) = |x 


| 2 + 2(x, y) + | j| 2 < 


x\ 2 + 2 (x,y) +|j| 


< |x| 2 + 2\x\ ■ |j| + |j| 2 = (|x| + |j|) 2 , 


from which clearly follows the triangle inequality \z\<\x\ + \y\. 

Thus from Theorem 7.1 1 it follows that there exists a number cp that satisfies the 
equality (7.3). This number is what is called the angle between the vectors x and y. 
Condition (7.3) détermines the angle uniquely if we assume that 0 < <p <tt. 
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Définition 7.12 Two vectors x and y are said to be orthogonal if their inner product 
is equal to zéro: {x, y) = 0. 

Let us note that this repeats the définition given in Sect. 6.2 for a bilinear form 
cp(x, y) = (x, y). By the définition given above in (7.3), the angle between orthog- 
onal vectors is equal to j . 

For a Euclidean space, there is a useful criterion for the linear independence of 
vectors. Let a i , . . . , a m be m vectors in the Euclidean space L. 

Définition 7.13 The Gram déterminant , or G r ami an, of a System of vectors 
a i , ... , a m is the déterminant 


G (ai, . . . , d m ) — 


(ai, ai) 

{d\,d2) 

(d 1 , d m ) 


(02, «l) 

{d2,d2) 

{d2, d m ) 


• 

• 

• 

(7.7) 

{d m , a i ) 

(flm,a 2 ) 

{dm y d m ) 


, . . . , a rn 

are linearly dépendent , then the Gram de- 


terminant G{a \, . . . , a m ) is equal to zéro , while if they are linearly indépendant , 
then G {a i , . . . , a rn ) > 0. 


P roof If the vectors a i , . . . , a m are linearly dépendent, then as was shown in 
Sect. 3.2, one of the vectors can be expressed as a linear combination of the oth- 
ers. Let it be the vector a m , that is, a rn = ot\a\ + • • • + ot m -\d m -i- Then from the 
properties of the inner product, it follows that for every i = 1, . . . , m, we hâve the 
equality 


(Clf n ? d{) — (a i , (li ) -|- Ot 2 (& 2 y dj) T * * * T OL m — 1 {dm — 1 ? d \ ). 

From this it is clear that if we subtract from the last row of the déterminant (7.7), ail 
the previous rows multiplied by coefficients a \, . . . , ct m -u then we obtain a déter- 
minant with a row consisting entirely of zéros. Therefore, G (ai, . . . , a m ) = 0. 

Now suppose that vectors a \ , . . . , a m are linearly independent. Let us consider in 
the subspace L' = (ai, . . . , a m ), the quadratic form (x 2 ). Setting x — a\a\ + • • • + 
oi tn a m , we may write it in the form 


m 

(( û'iai H h ct m a m Ÿ) = ^2 a i a j( a i,aj). 

ij = i 

It is easily seen that this quadratic form is positive definite, and its déterminant coin- 
cides with the Gram déterminant G (ai, . . . , a m ). By Theorem 6.19, it now follows 
that G (ai, . . . , a m ) > 0. □ 

Theorem 7.14 is a broad generalization of the Cauchy-Schwarz inequality. In- 
deed, for m = 2, inequality (7.6) is obvious (it becomes an equality) if vectors x 
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and y are linearly dépendent. However, if x and y are linearly independent, then 
their Gram déterminant is equal to 


G(x,y ) 


(*,*) Oc, y) 
(x, y) (y, y) 


The inequality G(x, y) > 0 established in Theorem 7.14 gives us (7.6). In partic- 
ular, we see that inequality (7.6) becomes an equality only if the vectors x and y 
are proportional. We remark that this is easy to dérivé if we examine the proof of 
Theorem 7.11. 


Définition 7.15 Vectors e \ , . . . , e m in a Euclidean space form an orthonormal Sys- 
tem if 

(ei,ej) = 0 for i^j, (c;,c/)= 1, (7.8) 

that is, if these vectors are mutually orthogonal and the length of each of them is 
equal to 1. If m — n and the vectors e\, ... , e n form a basis of the space, then such 
a basis is called an orthonormal basis. 


It is obvious that the Gram déterminant of an orthonormal basis is equal to 1 . 

We shall now use the fact that a quadratic form (x 2 ) is positive definite and 
apply to it formula (6.28), in which by the définition of positive definiteness, s — n. 
This resuit can now be reformulated as an assertion about the existence of a basis 

e \ , . . . , e n of the space L in which the scalar square of a vector x — oc\e\ H Va n e n 

is equal to the sum of the squares of its coordinates, that is, (x 2 ) = a\ + • • • + a 2 . 
In other words, we hâve the following resuit. 

Theorem 7.16 Every Euclidean space has an orthonormal basis. 

Remark 7.17 In an orthonormal basis, the inner product of x = (oq, . . . , a n ) and 
y = (/3i, . . . , f3 n ) has a particularly simple form, given by formula (7.1). Accord- 
ingly, in an orthonormal basis, the scalar square of an arbitrary vector is equal to the 
sum of the squares of its coordinates, while its length is equal to the square root of 
the sum of the squares. 

The lemma establishing the décomposition (7.4) has an important and far- 
reaching generalization. To formulate it, we recall that in Sect. 3.7, for every sub- 
space L' c L we defined its annihilator (L') fl C L*, while earlier in this section, we 
showed that an arbitrary Euclidean space L of finite dimension can be identified 
with its dual space L*. As a resuit, we can view ( L') a as a subspace of the original 
space L. In this light, we shall call it the orthogonal complément of the subspace 
L' and dénoté it by (L') -1 . If we recall the relevant définitions, we obtain that the 
orthogonal complément (L/) -1 of the subspace L' c L consists of ail vectors y e L 
for which the following condition holds: 


(x, y) = 0 for ail x g L'. 


(7.9) 
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On the other hand, ( L / ) _L is the subspace (L')jJ', defined for the case that the bilinear 
form <p(x, y) is given by <p(x, y) = (x, y); see p. 198. 

A basic property of the orthogonal complément in a finite-dimensional Euclidean 
space is contained in the following theorem. 

Theorem 7.18 For an arbitrary subspace Li of a finite-dimensional Euclidean 
space L, the following holds : 

L = Li © L] 1 . (7.10) 

In the case Li = (e), Theorem 7.18 follows from Lemma 7.8. 

P roof of Theorem 7.18 In the previous chapter, we saw that every quadratic form 
js(x) in some basis of a vector space L can be reduced to the canonical form (6.22), 
and in the case of a real vector space, to the form (6.28) for some scalars 0 < s < r, 
where s is the index of inertia and r is the rank of the quadratic form j/(x), or 
equivalently, the rank of the symmetric bilinear form cp(x, y) associated with f(x) 
by the relationship (6.11). We recall that a bilinear form cp(x, y) is nonsingular if 
r — n, where n — dim L. 

The condition of positive definiteness for the form f{x) is équivalent to the 
condition that ail scalars X \, . . . , X n in (6.22) be positive, or equivalently, that the 
equality s — r — n hold in formula (6.28). From this it follows that a symmetric 
bilinear form ç{x,y) associated with a positive definite quadratic form js(x) is 
nonsingular on the space L as well as on every subspace L' c L. To complété the 
proof, it suffices to recall that by définition, the quadratic form ( x 2 ) associated with 
the inner product ( x , y) is positive definite and to use Theorem 6.9 for the bilinear 
form (p(x, y) — (x, y). □ 

From relationship (3.54) for the annihilator (see Sect. 3.7) or from Theorem 7.18, 
it follows that 

dim(Li ) _L = dim L — dim l_i . 

The map that is the projection of the space L onto the subspace Li parallel to Lf 
(see the définition on p. 103) is called the orthogonal projection of L onto L\. Then 
the projection of the vector x e L onto the subspace l_i is called its orthogonal 
projection onto Li. This is a natural generalization of the notion introduced above 
of orthogonal projection of a vector onto a line. Similarly, for an arbitrary subset 
X C L, we can define its orthogonal projection onto Li . 

The Gram déterminant is connected to the notion of volume in a Euclidean space, 
generalizing the notion of the length of a vector. 

Définition 7.19 The parallelepiped spanned by vectors a\, ... ,a m is the collection 

of ail vectors a\a \ 4 h a? m a m for ail 0 < oti < 1. It is denoted by Tl(a \ , . . . , a m ). 

A base of the parallelepiped 77 (a i, ...,a m ) is a parallelepiped spanned by any 
m — 1 vectors among ai, ... , a m , for example, FI (ai, ... , a m - 1 ). 
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Fig. 7.2 Altitude of a 
parallelepiped 



In the case of the plane (see Example 7.2), we hâve parallelepipeds 77 {a i) and 
77 (a i, « 2 )- B y définition, FI{d[) is the segment whose beginning and end coincide 
with the beginning and end of the vector a\, while FI (a 1 , « 2 ) is the parallelogram 
constructed from the vectors a\ and « 2 - 

We return now to the considération of an arbitrary parallelepiped 

and we define the subspace l_i = (a \, . . . , a m ~\). To this case we may apply the 
notion introduced above of orthogonal projection of the space L. By the décompo- 
sition (7.10), the vector a m can be uniquely represented in the form a m = x + y, 
where x e Li and y etf. The vector y is called the altitude of the parallelepiped 
TI (« 1 , . . . , a m ) dropped to the base 77 (« 1 , . . . , a m - 1 ). The construction we hâve 
described is depicted in Fig. 7.2 for the case of the plane. 

Now we can introduce the concept of volume of a parallelepiped 

FI (a 1 , • • • , dm ) ? 

or more precisely, its uno rient ed volume. This is by définition a nonnegative number, 
denoted by V (a\ , . . . , a m ) and defined by induction on m. In the case m — 1, it is 
equal to V (a\) — |«i|, and in the general case, V(a\, ...,a m ) is the product of 
V(a 1 , • • • ? dfji — 1 ) and the length of the altitude of the parallelepiped FI ( d \ , . . . , dfyi) 
dropped to the base 77 (« 1 , . . . , a m -\). 

The following is a numerical expression for the unoriented volume: 

V 2 (ai, ...,a m ) = G{a\, .... a m ). (7.11) 

This relationship shows the géométrie meaning of the Gram déterminant. 

Formula (7.11) is obvious for m = 1, and in the general case, it is proved by 
induction on m. According to (7.10), we may represent the vector a m in the form 

a rn — x + y, where x G Li = {a \, . . . , a m -\) and y e L Then a m = a\a\ H h 

a m -\a m -\ + J. We note that y is the altitude of our parallelepiped dropped to the 
base FI (ai, . . . , a m -\). Fet us recall formula (7.7) for the Gram déterminant and 
subtract from its last column, each of the other columns multiplied by oq , . . . , a m -\ . 
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As a resuit, we obtain 



(« l , a i ) 

(ai, « 2 ) 

0 

(«2,«l) 

(«2 ,a 2 ) 

0 

(Mm — 1 •> ü 1 ) 

••• 

0 

a i) 

(a m ,a 2 ) 

(y, dm) 


(7.12) 


and moreover, (y, a m ) = (y, y) = \ y| 2 , since 

Expanding the déterminant (7.12) along its last column, we obtain the equality 


G(a i, . . . , a m ) — G (a 1 , . . . , a m -i)\y\ 2 . 


Let us recall that by construction, y is the altitude of the parallelepiped TT (« i , . . . , 
a m ) dropped to the base 77 (a i, . . . , a m -\). By the induction hypothesis, we hâve 
G (a i, ... , a m - 1 ) = V 2 (« i, . . . , a rn - 1 ), and this implies 

G(a i, . . . , a,,,) = V 2 (ai a m -i)\y\ 2 = V 2 (a i, . . . , a m ). 


Thus the concept of unoriented volume that we hâve introduced differs from the 
volume and area about which we spoke in Sects. 2.1 and 2.6, since the unoriented 
volume cannot assume négative values. This explains the term “unoriented.” We 
shall now formulate a second way of looking at the volume of a parallelepiped, 
one that generalizes the notions of volume and area about which we spoke earlier 
and differs from unoriented volume by the sign d=l. By Theorem 7.14, of interest 
is only the case in which the vectors a i , . . . , a m are linearly independent. Then we 
may consider the space L = (a \ , . . . , a m ) with basis a i , . . . , a m . 

Thus we are given n vectors a \ , . . . , a n , where n = dim L. We consider the matrix 
A, whose j th column consists of the coordinates of the vector a j relative to some 
orthonormal basis e \ , . . . , e n : 


*a\\ 

an 

••• a\ n \ 

<221 

ail 

• • • a 2n 

• • 

\a«i 

O n 2 

• • 
a n ,i / 


An easy vérification shows that in the matrix A* A, the intersection of the i th row 
and yth column contains the element («/, a j). This implies that the déterminant of 
the matrix A* A is equal to G(a \, . . . , a n ), and in view of the equalities |A*A| = 

| A* | • | A | = | A | 2 , we obtain | A | 2 = G(a i, . . . , a n ). On the other hand, from formula 
(7.11), it follows that G (a j, . . . , a n ) = V 2 (a \, . . . , a n ), and this implies that 

I A | = ±V («!,..., a n ). 

The déterminant of the matrix A is called the oriented volume of the /i-dimensional 
parallelepiped n(a\ , . . . , a n ). It is denoted by v(a\ , . . . , a n ). Thus the oriented and 
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unoriented volumes are related by the equality 



Since the déterminant of a matrix does not change under the transpose operation, 
it follows that v(ai , . . . , a n ) — |A*|. In other words, for computing the oriented 
volume, one may write the coordinates of the generators of the parallelepiped a, not 
in the columns of the matrix, but in the rows, which is sometimes more convenient. 

It is obvious that the sign of the oriented volume dépends on the choice of or- 
thonormal basis e \, . . . , e n . This dependence is suggested by the term “oriented.” 
We shall hâve more to say about this in Sect. 7.3. 

The volume possesses some important properties. 


Theorem 7.20 Let G : L L be a linear transformation of the Euclidean space L 
of dimension n. Thenfor any n vectors a \, . . . , a n in this space , one has the rela- 
tionship 

v(e(ai), ...,G(a n )) = I e I v (a 1 , ...,a n ). (7.13) 


Proof We shall choose an orthonormal basis of the space L. Suppose that the trans- 
formation G has matrix C in this basis and that the coordinates oq, . . . , a n of an 
arbitrary vector a are related to the coordinates f >\, . . . , f> n of its image G (a) by 
the relationship (3.25), or in matrix notation, (3.27). Let A be the matrix whose 
columns consist of the coordinates of the vectors a \ , . . . , a n , and let A' be the ma- 
trix whose columns consist of the coordinates of the vectors G {a 1 ), . . . , G(a n ). Then 
it is obvious that we hâve the relationship A' — CA, from which it follows that 
\A'\ = |C| • |A|. 

To complété the proof, it remains to note that |C| = |C|, and by the déf- 
inition of oriented volume, we hâve the equalities v(a \, . . . , a n ) = \A\ and 
v(G(a\), , G(a n )) = \A'\. □ 


It follows from this theorem, of course, that 


V(C(ai)....,C(a„)) = 


\A\ V (ai, , a„), 


(7.14) 


where 1 1 A 1 1 dénotés the absolute value of the déterminant of the matrix A . 

Using the concepts introduced thus far, we may define an analogue of the volume 
V ( M ) for a very broad class of sets M containing ail the sets actually encountered 
in mathematics and physics. This is the subject of what is called measure theory, but 
since it is a topic that is rather far removed from linear algebra, it will not concern 
us here. Let us note only that the important relationship (7.14) remains valid here: 


y(C(M)) = ||A||V(M). 


(7.15) 


An interesting example of a set in an n-dimensional Euclidean space is the bail B(r) 
of radius r, namely the set of ail vectors x e L such that \x\ <r. The set of vectors 
x G L for which \x\ — r is called the sphere S(r) of radius r. From the relationship 
(7.15) it follows that V ( B(r )) = V n r n , where V n — V (B( 1)). The calculation of the 
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interesting géométrie constant V n is a question from analysis, related to the theory 
of the gamma f miction F. Here we shall simply quote the resuit: 

TT "/ 2 

v n = . 

F (n/2+1) 

It follows from the theory of the gamma function that if n is an even number 
(n = 2m), then V n = n m /m\, and if n is odd ( n = 2m + 1), then V n = 2 ,u+[ tt iu /(l • 
3 • • • (2m + 1)). 


7.2 Orthogonal Transformations 

Let Li and L 2 be Euclidean spaces of the same dimension with inner products 
(x, y)i and (x, y ) 2 defined on them. We shall dénoté the length of a vector x in 
the spaces l_i and L 2 by |x|i and |x I 2 , respectively. 

Définition 7.21 An isomorphism of Euclidean spaces Lj and L 2 is an isomorphism 
A : Li -> L 2 of the underlying vector spaces that préserves the inner product, that 
is, for arbitrary vectors x, y e Li, the following relationship holds: 

(x, y)i = («A(x), A(y)) r (7.16) 

If we substitute the vector y — x into equality (7.16), we obtain that |x| 2 = 
| «>4>(x) I 2 * and this implies that |x|i = | *>4>(x) I 2 , that is, the isomorphism A préserves 
the lengths of vectors. 

Conversely, if A : Li — ► L 2 is an isomorphism of vector spaces that préserves the 
lengths of vectors, then |«A(x + y)| 2 = |x + y| 2 , and therefore, 

| .A (a: ) | ^ + 2 (A(x), A(y)) 2 + \My)\l = 1*1? + 2(x, j)i + | 

But by assumption, we also hâve the equalities |«>4>(x) I 2 = |x|i and |cA(y )|2 = |y|i, 
which implies that (x,y)i = (A(x), A(y)) 2 - This, strictly speaking, is a consé- 
quence of the fact (Theorem 6.6) that a symmetric bilinear form (x, y) is determined 
by the quadratic form (x, x), and here we hâve simply repeated the proof given in 
Sect. 4.1. 

If the spaces Li and L 2 hâve the same dimension, then from the fact that the linear 
transformation A : l_i — ► L 2 préserves the lengths of vectors, it already follows that 
it is an isomorphism. Indeed, as we saw in Sect. 3.5, it suffices to verify that the 
kernel of the transformation A is equal to (0). But if +>(x) = 0, then |*>4>(x)|2 = 0, 
which implies that |x| 1 =0, that is, x = 0. 

Theorem 7.22 Ail Euclidean spaces of a given finite dimension are isomorphic to 
each other. 
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P roof From the existence of an orthonormal basis, it follows at once that every n- 
dimensional Euclidean space is isomorphic to the Euclidean space in Example 7.3. 
Indeed, let e \ , . . . , e n be an orthonormal basis of a Euclidean space L. Assigning to 
each vector x e L the row of its coordinates in the basis e \ , . . . , e n , we obtain an 
isomorphism of the space L and the space M ;î of rows of length n with inner product 
(7.1) (see the remarks on p. 218). It is easily seen that isomorphism is an équivalence 
relation (p. xii) on the set of Euclidean spaces, and by transitivity, it follows that ail 
Euclidean spaces of dimension n are isomorphic to each other. □ 

Theorem 7.22 is analogous to Theorem 3.64 for vector spaces, and its general 
meaning is the same (this is elucidated in detail in Sect. 3.5). For example, using 
Theorem 7.22, we could hâve proved the inequality (7.6) differently from how it 
was done in the preceding section. Indeed, it is completely obvious (the inequality 
is reduced to an equality) if the vectors x and y are linearly dépendent. If, on the 
other hand, they are linearly independent, then we can consider the subspace L' = 
{x, y). By Theorem 7.22, it is isomorphic to the plane (Example 7.2 in the previous 
section), where this inequality is well known. Therefore, it must also be correct for 
arbitrary vectors x and y. 


Définition 7.23 A linear transformation Xi of a Euclidean space L into itself that 
préserves the inner product, that is, satisfies the condition that for ail vectors x and 

J’ 


(x,y) = (U(x),U(y)), 


(7.17) 


is said to be orthogonal. 


This is clearly a spécial case of an isomorphism of Euclidean spaces l_i and 1_2 
that coincide. 

It is also easily seen that an orthogonal transformation XI takes an orthonormal 
basis to another orthonormal basis, since from the conditions (7.8) and (7.17), it 
follows that Xl{e \), . . . , Xi(e n ) is an orthonormal basis if e \ , . . . , e n is. Conversely, 
if a linear transformation XI takes some orthonormal basis e \ , . . . , e n to another 
orthonormal basis, then for vectors x — ot\e\ + • • • + ot n e n and y = fi\e\ + • • • + 
f n e n , we hâve 

U(x ) = aiU(eO H h a n U(e n ), U(y) = P\ U(e i) H h p n ll(e n ). 

Since both e\, ... ,e n and Xi(e i), . . . , Xi(e n ) are orthonormal bases, it follows by 
(7.1) that both the left- and right-hand sides of relationship (7.17) are equal to the 

expression a \-a n p n , that is, relationship (7.17) is satisfied, and this implies 

that X L is an orthogonal transformation. 

We note the following important reformulation of this fact: for any two orthonor- 
mal bases of a Euclidean space, there exists a unique orthogonal transformation that 
takes the first basis into the second. 

Let U — (u/j) be the matrix of a linear transformation Xi in some orthonormal 
basis e \ , . . . , e n . It follows from what has gone before that the transformation Xi is 
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orthogonal if and only if the vectors VL(e \), . . . , VL(e n ) form an orthonormal basis. 
But by the définition of the matrix U, the vector VL(ei) is equal to Ylk = 1 u ki e k , and 
since e \ , . . . , e n is an orthonormal basis, we hâve 


( T/ (£/), %L(ej)) — U\iU\j + U2iU2j + • • • + U n [U n j . 


The expression on the right-hand side is equal to the element C[j , where the ma- 
trix (cij) is equal to U* U. This implies that the condition of orthogonality of the 
transformation VL can be written in the form 


U*U = E, 


(7.18) 


or equivalently, U* = U 1 . This equality is équivalent to 


UU* = E , 


(7.19) 


and can be expressed as relationships among the éléments of the matrix U : 

u i\Uj\ -\- • • • + Ui n Uj n — 0 for / 7 ^ j, uj j + • • • + — 1. (7.20) 

The matrix £/ satisfying the relationship (7.18) or the équivalent relationship (7.19) 
is said to be orthogonal. 

The concept of an orthonormal basis of a Euclidean space can be interpreted 
more graphically using the notion of flag (see the définition on p. 101). Namely, we 
associate with an orthonormal basis e \ , . . . , e n the flag 

(O)cLi CL 2 C*--CL„ = L, (7.21) 

in which the subspace L / is equal to {e \, . . . , e/), and the pair (L/_i, L /) is directed 
in the sense that L ; + is the half-space of L, containing the vector ej . In the case of a 
Euclidean space, the essential fact is that we obtain a bijection between orthonormal 
bases and flags. 

For the proof of this, we hâve only to verify that the orthonormal basis e \ , . . . , e n 
is uniquely determined by its associated flag. Let this basis be associated with 
the flag (7.21). If we hâve already constructed an orthonormal System of vectors 
e\, ...,£/_ i such that L/_i = (e \, . . . , £;-i), then we should consider the orthogo- 
nal complément Lj~_ ] of the subspace L/_i in L/. Then dimL^j = 1 and L j-_ { = (et), 
where the vector e- t is uniquely defined up to the factor d= 1 . This factor can be se- 
lected unambiguously based on the condition et e L ; + . 

An observation made earlier can now be interpreted as follows: For any two flags 
(p i and 02 of a Euclidean space L, there exists a unique orthogonal transformation 
that maps 0\ to 0 2 - 

Our next goal will be the construction of an orthonormal basis in which a given 
orthogonal transformation VL has the simplest matrix possible. By Theorem 4.22, 
the transformation VL has a one- or two-dimensional invariant subspace L'. It is clear 
that the restriction of VL to the subspace L' is again an orthogonal transformation. 


226 


7 Euclidean Spaces 


Let us détermine first the sort of transformation that this can be, that is, what sorts 
of orthogonal transformations of one- and two-dimensional spaces exist. 

If dimL = 1, then L = (e) for some nonnull vector e. Then \l(e) — ae, where 
a is some scalar. From the orthogonality of the transformation VL, we obtain that 

(e, e ) = ( ae , ae) = a 2 (e, e), 

from which it follows that a 2 = 1, and this implies that a — ±1. Consequently, in 
a one-dimensional space L, there exist two orthogonal transformations: the identity 
8 , for which é?(x) = x for ail vectors x, and the transformation VL such that VL(x) = 
—x. It is obvious that \l — —8. 

Now let dim L ' = 2, in which case L is isomorphic to the plane with inner product 
(7.1). It is well known from analytic geometry that an orthogonal transformation of 
the plane is either a rotation through some angle <p about the origin or a reflection 
with respect to some line /. In the first case, the orthogonal transformation VL in an 
arbitrary orthonormal basis of the plane has matrix 


( COS (P 

sincp 


— sin<p 
cos <p 


(7.22) 


In the second case, the plane can be represented in the form of the direct sum L ' = 
/ 0 l 1 -, where / and l 2 - are lines, and for a vector x we hâve the décomposition 
x = y + z, where y g / and z G /\ while the vector ‘M(x) is equal to y — z. If we 
choose an orthonormal basis e\,e 2 in such a way that the vector e \ lies on the line 
/, then the transformation V. will hâve matrix 



(7.23) 


But we shall not présupposé this fact from analytic geometry, and instead show 
that it dérivés from simple considérations in linear algebra. Let VL hâve, in some 
orthonormal basis e\ , ^ the matrix 




(7.24) 


that is, it maps the vector xe\ + y^i to ( ax + by)e \ + ( ex + dy)e 2 . The fact that VL 
préserves the length of a vector gives the relationship 

( 1 ax + by) 2 + (ex + dy ) 2 = x 2 -h y 2 


for ail x and y. Substituting in turn (1,0), (0, 1), and (1, 1) for (x, y), we obtain 

a 2 + c 2 = 1, b 2 +d 2 = 1, ab + cd = 0. (7.25) 


From the relationship (7.19), it follows that \ UU*\ = 1, and since \ U*\ = \ U\,it fol- 
lows that \ U\ 2 — 1, and this implies that \ U\ = ±1. We need to consider separately 
the cases of different signs. 
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If \u\ = —1, then the characteristic polynomial \ U — tE\ of the matrix (7.24) is 
equal to t 2 — (a + d)t — 1 and has positive discriminant. Therefore, the matrix (7.24) 
has two real eigenvalues and À 2 of opposite signs (silice by Viète’s theorem, 
X\X 2 = —1) and two associated eigenvectors e\ and ^ 2 - Examining the restriction 
of VL to the one-dimensional invariant subspaces (e\) and {ei), we arrive at the 
one-dimensional case considered above, from which, in particular, it follows that 
the values Ài and À 2 are equal to d=l. Let us show that the vectors e\ and ei are 
orthogonal. By the définition of eigenvectors, we hâve the equalities VL(ei) — À/£/, 
from which we hâve 

(V.(e[), V.(e 2 )) = (X\e\, = 7\X2(e\, eï)- (7.26) 

But since the transformation VL is orthogonal, it follows that (VL(e\), VL(e 2 )) = 
(e\, 02 )» and from (7.26), we obtain the equality (e\, ef) = A. 1 A 2 (^ 1 , £ 2 )- Since Ài 
and À 2 hâve opposite signs, it follows that (ei, ^ 2 ) = 0- Choosing eigenvectors e\ 
and £2 of unit length and such that Ài = 1 and A . 2 = — 1, we obtain the orthonormal 
basis e\,e 2 in which the transformation VL has matrix (7.23). We then hâve the dé- 
composition L = / ® /- 1 , where l — (e\) and = ( 02 )» and the transformation VL is 
a reflection in the line /. 

But if \ U\ = 1, then by relationship (7.25) for a,b,c, d, it is easy to dérivé, keep- 
ing in mind that ad — bc — 1 , that there exists an angle (p such that a — d — cos cp 
and c = —b — sin<p, that is, the matrix (7.24) has the form (7.22). 

As a basis for examining the general case, we hâve the following theorem. 

Theorem 7.24 If a subspace L' is invariant with respect to an orthogonal trans- 
formation VL, then its orthogonal complément (L')^ is also invariant with respect 
to VL. 

Proof We must show that for every vector y g (L / ) _l , we hâve VL(y) e (17)^. If 
y e (L r ) then (x, y) = 0 for ail x e L' . From the orthogonality of the transforma- 
tion VL, we obtain that ( VL(x ), VL(y)) = (x, y) = 0. Since VL is a bijective mapping 
from L to L, its restriction to the invariant subspace L ' is a bijection from L' to L'. In 
other words, every vector x' e L' can be represented in the form x ' = VL(x), where 
x is some other vector in L'. Consequently, (x' , VL(y )) = 0 for every vector x' e L', 
and this implies that VL(y) e (L/) -1 . □ 

Remark 7.25 In the proof of Theorem 7.24, we nowhere used the positive definite- 
ness of the quadratic form (x, x) associated with the inner product (x, y). Indeed, 
this theorem holds as well for an arbitrary nonsingular bilinear form (x, y). The 
condition of nonsingularity is required in order that the restriction of the transfor- 
mation VL to an invariant subspace be a bijection, without which the theorem would 
not be true. 

Définition 7.26 Subspaces Lj and L 2 of a Euclidean space are said to be mutually 
orthogonal if (x, y) = 0 for ail vectors x G Lj and y g L 2 . In such a case, we write 
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Lj _L l_ 2 * The décomposition of a Euclidean space as a direct sum of orthogonal 
subspaces is called an orthogonal décomposition. 


If dim L > 2, then by Theorem 4.22, the transformation XL has a one- or two- 
dimensional invariant subspace. Thus using Theorem 7.24 as many times as neces- 
sary (depending on dim L), we obtain the orthogonal décomposition 

L = Li 0 l _2 ® • • • 0 U, where L/ _L L; for ail i ^ y, (7.27) 

with ail subspaces L, invariant with respect to the transformation V. and of dimen- 
sion 1 or 2. 

Combining the orthonormal bases of the subspaces l_i, . . . , L* and choosing a 
convenient ordering, we obtain the following resuit. 


Theorem 7.27 For every orthogonal transformation there exists an orthonormal 
basis in which the matrix ofthe transformation has the blo ck- diagonal form 



1 



0 


\ 


\ 


0 



(7.28) 


where 


(Pi Tt k, k e Z. 


Aw — 


cos (pi — sin (pi 
sin (pi cos (pi 


(7.29) 


Let us note that the déterminants of ail the matrices (7.29) are equal to 1, and 
therefore, for a proper orthogonal transformation (see the définition on p. 135), the 
number of — l’s on the main diagonal in (7.28) is even, and for an improper orthog- 
onal transformation, that number is odd. 

Let us now look at what the theorems we hâve proved give us in the cases n = 
1, 2, 3 familiar from analytic geometry. 

For n — 1 , there exist, as we hâve already seen, altogether two orthogonal trans- 
formations, namely 8 and — g, the first of which is proper, and the second, improper. 

For n — 2, a proper orthogonal transformation is a rotation of the plane through 
some angle (p. In an arbitrary orthonormal basis, its matrix has the form A ^ from 
(7.29), with no restriction on the angle (p. For the improper transformation appearing 
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Fig. 7.3 Reflection of the 
plane with respect to a line 



in (7.28), the number — 1 must be encountered an odd number of times, that is, once. 
This implies that in some orthonormal basis e\ , ei, its matrix has the form 


This transformation is a reflection of the plane with respect to the line ( 02 ) (Fig- 7.3). 

Let us now consider the case n — 3. Since the characteristic polynomial of the 
transformation VL has odd degree 3, it must hâve at least one real root. This implies 
that in the représentation (7.28), the number +1 or —1 must appear on the main 
diagonal of the matrix. 

Let us consider proper transformations first. In this case, for the matrix (7.28), 
we hâve only one possibility: 


If the matrix is written in the basis e\, e2,e?>, then the transformation VL does not 


in the plane ( £ 2 , £ 3 ). In this case, we say that the transformation VL is a a rotation 
of the plane through the angle (p about the axis /. That every proper orthogonal 
transformation of a three-dimensional Euclidean space possesses a “rotational axis” 


assertion later, in connection with motions of affine spaces. 

Finally, if an orthogonal transformation is improper, then in expression (7.28), 
we hâve only the possibility 



1 0 0 
0 cos<p — sin<^ 
0 sin^ cos cp 



change the points of the line / = (e\) and represents a rotation through the angle <p 


is a resuit first proved by Euler. We shall discuss the mechanical significance of this 



In this case, the orthogonal transformation VL reduces to a rotation about the /-axis 
with a simultaneous reflection with respect to the plane l - L . 


230 


7 Euclidean Spaces 


7.3 Orientation of a Euclidean Space* 

In a Euclidean space, as in any real vector space, there are defined the notions 
of equal and opposite orientations of two bases and orientation of the space (see 
Sect. 4.4). But in Euclidean spaces, these notions possess certain spécifie features. 

Let e \ , . . . , e n and e' v ... , e' n be two orthonormal bases of a Euclidean space L. 
B y general définition, they hâve equal orientations if the transformation from one 
basis to the other is proper. This implies that for a transformation VL such that 

U(ei) = e i, ..., U(e n ) = e' n , 

the déterminant of its matrix is positive. But in the case that both bases under consid- 
ération are orthonormal, the mapping VL, as we know, is orthogonal, and its matrix 
U satisfies the relationship | U \ = =t 1 . This implies that VL is a proper transforma- 
tion if and only if | U | = 1 , and it is improper if and only if | U | = — 1 . We hâve the 
following analogue to Theorems 4.38-4.40 of Sect. 4.4. 

Theorem 7.28 Two orthogonal transformations of a real Euclidean space can be 
continuously deformed into each other if and only if the signs oftheir déterminants 
coincide. 

The définition of a continuous deformation repeats here the définition given in 
Sect. 4.4 for the set 21, but now consisting only of orthogonal matrices (or trans- 
formations). Since the product of any two orthogonal transformations is again or- 
thogonal, Lemma 4.37 (p. 159) is also valid in this case, and we shall make use of 
it. 

P roof of Theorem 7.28 Let us show that an arbitrary proper orthogonal transfor- 
mation T L can be continuously deformed into the identity. Since the condition of 
continuous deformability defines an équivalence relation on the set of orthogonal 
transformations, then by transitivity, the assertion of the theorem will follow for ail 
proper transformations. 

Thus we must prove that there exists a family of orthogonal transformations "lit 
depending continuously on the parameter t e [0, 1] for which Vio = G and Vi\ = VL. 
The continuous dependence of VL t implies that when it is represented in an arbitrary 
basis, ail the éléments of the matrices of the transformations Vi t are continuous 
functions of t. We note that this is a not at ail obvious corollary to Theorem 4.38. 
Indeed, it did not guarantee us that ail the intermediate transformations V i t for 0 < 
t < 1 are orthogonal. A possible “bad” deformation A t taking us out of the domain 
of orthogonal transformations is depicted as the dotted line in Fig. 7.4. 

We shall use Theorem 7.27 and examine the orthonormal basis in which the 
matrix of the transformation VL has the form (7.28). The transformation %l is proper 
if and only if the number of instances of — 1 on the main diagonal of (7.28) is odd. 
We observe that the second-order matrix 
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Fig. 7.4 Deformation taking 
us outside the domain of 
orthogonal transformations 


nonorthogonal 

transformations 



orthogonal 

transformations 


can also be written in the form (7.29) for cpi = ir . Thus a proper orthogonal trans- 
formation can be written in a suitable orthonormal basis in block-diagonal form 


( E 

\ 




(7.30) 


where the arguments cpi can now be taken to be any values. Formula (7.30) in fact 
gives a continuous deformation of the transformation VL into S . To maintain agree- 
ment with our notation, let us examine the transformations TO having in this same 
basis the matrix 


(E 

A t(pi 


\ 

A-tcpk) 


(7.31) 


Then it is clear first of ail that the transformation TO is orthogonal for every t, and 
secondly, that Tto = £ and TL = TL This gives us a proof of the theorem in the case 
of a proper transformation. 

Let us now consider improper orthogonal transformations and show that any such 
transformation V can be continuously deformed into a reflection with respect to a 
hyperplane, that is, into a transformation F having in some orthonormal basis the 
matrix 



(7.32) 


Let us choose an arbitrary orthonormal basis of the vector space and suppose that in 
this basis, the improper orthogonal transformation V has matrix V. Then it is obvi- 
ous that the transformation T( with matrix U —VF in this same basis is a proper 
orthogonal transformation. Taking into account the obvious relationship F -1 = F, 
we hâve V — U F , that is, V = C U!F . We shall use the family T i, effecting a con- 
tinuous deformation of the proper transformation V, into S. From the preceding 
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Fig. 7.5 Oriented length B O e > A 

-o o o- 

equality, with the help of Lemma 4.37, we obtain the continuous family V t = 
where Vq = 8 !F = !F and V\ = VUF = V. Thus the family V t = VL t !F effects the 
deformation of the improper transformation V into !F . □ 

In analogy to what we did in Sect. 4.4, Theorem 7.28 gives us the following topo- 
logical resuit: the set of orthogonal transformations consists of two path-connected 
components: the proper and improper orthogonal transformations. 

Exactly as in Sect. 4.4, from what we hâve proved, it also follows that two equally 
oriented orthogonal bases can be continuously deformed into each other. That is, if 
e i , . . . , e n and e' { , ... , e' n are orthogonal bases with the same orientation, then there 
exists a family of orthonormal bases e\ (t), . . . , e n (t) depending continuously on 
the parameter t e [0, 1] such that e/(0) = et and £/(l) = e' r In other words, the 
concept of orientation of a space is the same whether we define it in terms of an 
arbitrary basis or an orthonormal one. We shall further examine oriented Euclidean 
spaces, choosing an orientation arbitrarily. This choice makes it possible to speak of 
positively and negatively oriented orthonormal bases. 

Now we can compare the concepts of oriented and unoriented volume. These two 
numbers differ by the factor ±1 (unoriented volumes are nonnegative by définition). 
When the oriented volume of a parallelepiped n (a \ , . . . , a n ) in a space L of dimen- 
sion n was introduced, we noted that its définition dépends on the choice of some 
orthonormal basis e \ , . . . , e n . Since we are assuming that the space L is oriented, we 
can include in the définition of oriented volume of a parallelepiped 77 (a i, ... , a n ) 
the condition that the basis e\, ... ,e n used in the définition of v(a \, . . . , a n ) be 
positively oriented. Then the number v(a\, . . . , a n ) does not dépend on the choice 
of basis (that is, it remains unchanged if instead of e \, . . . , e n , we take any other 
orthonormal positively oriented basis ... , e' n ). This follows immediately from 
formula (7.13) for the transformation C — Vi and from the fact that the transforma- 
tion \L taking one basis to the other is orthogonal and proper, that is, \ VL\ = 1. 

We can now say that the oriented volume v(a \, . . . , a n ) is positive (and conse- 
quently equal to the unoriented volume) if the bases e\, ... ,e n and a \ , . . . , a n are 
equally oriented, and is négative (that is, it differs from the unoriented volume by a 
sign) if these bases hâve opposite orientations. For example, on the line (Fig. 7.5), 
the length of the segment OA is equal to 2, while the length of the segment O B is 
equal to —2. 

Thus, we may say that for the parallelepiped 77 (a i , . . . , a n ), its oriented volume 
is its “volume with orientation.” 

If we choose a coordinate origin on the real line, then a basis of it consists of 
a single vector, and vectors e\ and ae\ are equally oriented if they lie to one side 
of the origin, that is, a > 0. The choice of orientation on the line, one might say, 
corresponds to the choice of “right” and “left.” 

In the real plane, the orientation given by the basis e\, e-i is determined by the 
“direction of rotation” from e\ to e 2 '. clockwise or counterclockwise. Equally ori- 
ented bases e\, e 2 and e \ , e \ (Fig. 7.6(a) and (b)) can be continuously transformed 
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Fig. 7.6 Oriented bases of 
the plane 



one into the other, while oppositely oriented bases cannot even if they form equal 
figures (Fig. 7.6(a) and (c)), since what is required for this is a reflection, that is, an 
improper transformation. 

In real three-dimensional space, the orientation is defined by a basis of three 
orthonormal vectors. We again meet with two opposite orientations, which are rep- 
resented by our right and left hands (see Fig. 7.7(a)). Another method of providing 
an orientation in three-dimensional space is defined by a hélix (Fig. 7.7(b)). In this 
case, the orientation is defined by the direction in which the hélix turns as it rises — 
clockwise or counterclockwise. 2 


7.4 Examples* 

Example 7.29 By the term “figure” in a Euclidean space L we shall understand an 
arbitrary subset S C L. Two figures S and S f contained in a Euclidean space M of 
dimension n are said to be congruent , or geometrically identical , if there exists an 
orthogonal transformation T( of the space M taking S to S' . We shall be interested 
in the following question: When are figures S and S' congruent, that is, when do we 
hâve U(S) = S'1 

Let us first deal with the case in which the figures S and S ' consist of collections 
of m vectors: S = (a \ , . . . , a m ) and S' — (a \ , . . . , a' m ) with m < n. For S and S' 
to be congruent is équivalent to the existence of an orthogonal transformation T i 
such that T ((«/) = a for ail / = 1, ... , m. For this, of course, it is necessary that the 



Fig. 7.7 Different orientations of three-dimensional space 


2 The molécules of amino acids likewise détermine a certain orientation of space. In biology, the 
two possible orientations are designated by D (right = dexter in Latin) and L (left = laevus). For 
some unknown reason, they ail détermine the same orientation, namely the counterclockwise one. 
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following equality holds: 

(a i ,aj) = (a' i ,a' j ), = (7.33) 

Let us assume that vectors a \ , . . . , a m are linearly independent, and we shall 
then prove that the condition (7.33) is sufficient. By Theorem 7.14, in this case 
we hâve G(a j, . . . , a m ) > 0, and by assumption, G(a \, . . . , a' m ) = G(a i, . . . , a m ). 
From this same theorem, it follows that the vectors a ' { , . . . , a' m will also be linearly 
independent. 

Let us set 

L = («i , — a m ), L ' = (a' l ,...,a' m ), (7.34) 

and consider first the case m = n. Let M = («i, . . . , a m ). We shall consider the 
transformation VL : M — ► M given by the conditions VL(ai) = a , [ for ail / = 1, . . . , m. 
Obviously, such a transformation is uniquely determined, and by the relationship 

m 

(«/>«;) 

ij = 1 

and equality (7.33), it is orthogonal. 

Let m < n. Then we hâve the décomposition M = L©L i = L'® (L') -1 , where 
the subspaces L and L of the space M are defined by formula (7.34). By what has 
gone before, there exists an isomorphism V : L —> L such that V(at) = a! [ for ail 
i = 1, . . . , m. The orthogonal compléments \J- and (L) -1 of these subspaces hâve 
dimension n — m, and consequently, are also isomorphic (Theorem 7.22). Let us 
choose an arbitrary isomorphism TV : L 1 - —> (L) -1 . As a resuit of the décomposition 
M = L ® , an arbitrary vector x e M can be uniquely represented in the form x — 

y + z, where y e L and zgL 1 . Let us define the linear transformation VL : M — > M 
by the formula VL( x) = V(y) + TV(z). By construction, VL(di) = a- for ail i = 
1, . . . , m, and a trivial vérification shows that the transformation VL is orthogonal. 

Let us now consider the case that S = l and S' — l f are fines, and consequently, 
consist of an infinité number of vectors. It suffices to set / = (e) and l f = ( e '), where 
\e\ = \e f \ — 1, and to use the fact that there exists an orthogonal transformation VL 
of the space M taking e to e'. Thus any two fines are congruent. 

The next case in order of increasing complexity is that in which figures S and 
S' each consist of two fines: S = l\ U I 2 and S' = l\ U l' 2 . Let us set // = (et) and 
/• = (e'j), where \e/\ = \e'/ \ = 1 for i = 1 and 2. Now, however, vectors e\ and e 2 
are no longer defined uniquely, but can be replaced by —e\ or —e^- In this case, 
their lengths do not change, but the inner product (e\, eV) can change their sign, 
that is, what remains unchanged is only their absolute value | (^ 1 , ^ 2 ) I - Based on 
previous considérations, we may say that figures S and S' are congruent if and only 
if |(^i, 02)1 = I (e [ , e' 2 )\. If cp is the angle between the vectors e\ and e 2 , then we 
see that the fines l\ and I 2 détermine | cos<^|, or equivalently the angle <p, for which 
0 < cp < y. In textbooks on geometry, one often reads about two angles between 
straight fines, the “acute” and “obtuse” angles, but we shall choose only the one that 
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is acute or a right angle. This angle (p is called the angle between the Unes l\ and I 2 . 
The previous exposition shows that two pairs of lines h, h and l \ , l' 2 are congruent 
if and only if the angles between them thus defined coincide. 

The case in which a figure S consists of a line / and a plane L (dim / = 1, 
dim L = 2) is also related, strictly speaking, to elementary geometry, since dim(/ + 
L) < 3, and the figure S = l U L can be embedded in three-dimensional space. But we 
shall consider it from a more abstract point of view, using the language of Euclidean 
spaces. Let / = (e) and let / be the orthogonal projection of e onto L. The angle 
(p between the lines / and l' — (f) is called the angle between l and L (as already 
mentioned above, it is acute or right). The cosine of this angle can be calculated 
according to the following formula: 


COS iÇ — 


l(e. f)\ 

kl • l/l ' 


(7.35) 


Let us show that if the angle between the line / and the plane L is equal to the 
angle between the line l' and the plane L', then the figures S — l U L and S' — V U L' 
are congruent. First of ail, it is obvious that there exists an orthogonal transformation 
taking L to L', so that we may consider that L = L' . Let / = (e), \e\ = 1 and l r — ( e ' ), 
\e'\ — 1, and let us dénoté by / and f' the orthogonal projections e and e' onto L. 
By assumption, 


!(«./)! _ 

kl- l/l k'I-l/T 


(7.36) 


Since e and e f can be represented in the form e — f + x and e' — f' + y, 
where x, y e L- 1 , it follows that \(e, /) | = |/| 2 , \(e f , f')\ — \f'\ 2 . Moreover, \e\ — 
\e f \ — 1, and the relationship (7.36) shows that \ f\ — \f'\. 

Since e — x + /, we hâve \e \ 2 = \x\ 2 + 2(x, /) + |/| 2 , from which, if we take 
into account the equalities \e\ 2 — 1 and (x, f) — 0, we obtain \x\ 2 — 1 — |/| 2 and 
analogously, |y| 2 = 1 — \f'\ 2 . From this follows the equality |x| = \y\. Let us de- 
fine the orthogonal transformation VL of the space M = L ® L 1 - whose restriction to 
the plane L carries the vector f to f' (this is possible because |/| = |/'|), while 
the restriction to its orthogonal complément L 1 - takes the vector x to y (which is 
possible on account of the equality \x\ = |y|). Clearly, VL takes e to e ' and hence / 
to /', and by construction, the plane L in both figures is one and the same, and the 
transformation VL takes it into itself. 

We encounter a new and more interesting situation when we consider the case 
in which a figure S consists of a pair of planes Li and L 2 (dimLi = diml _2 = 2). 
If Li fl l _2 / (0), then dim(Li + L 2 ) < 3, and we are dealing with a question from 
elementary geometry (which, however, can be considered simply in the language of 
Euclidean spaces). Therefore, we shall assume that l_i D l _2 = (0) and similarly, that 
Lj H l _2 = (0). When are figures S — l_i U l _2 and S r — U L r 9 congruent? It turns 
out that for this to occur, it is necessary that there be agreement of not one (as in the 
examples considered above) but two parameters, which can be interpreted as two 
angles between the planes l_i and L 2 . 
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We shall consider ail possible straight Unes lying in the plane l_i and the angles 
that they form with the plane l_ 2 . To this end, we recall the géométrie interprétation 

of the angle between a line / and a plane I If / = (e), where \e\ = 1, then the angle 

(p between / and L is determined by formula (7.35) with the condition 0 < (p < j, 
where / is the orthogonal projection of the vector e onto L. From this, it follows that 
e — f + x, where x e L- 1 -, and this implies that (e, f) — (/, /) + (x, f ) = |/| 2 , 
whence the relationship (7.35) gives | coscp\ — |/|. In other words, to consider ail 
the angles between lines lying in the plane Li and the plane l_ 2 , we must consider 
the circle in Li consisting of ail vectors of length 1 and the lengths of the orthogonal 
projections of these vectors onto the plane l_ 2 . In order to write down these angles 
in a formula, we shall consider the orthogonal projection M —> l _2 of the space M 
onto the plane l_ 2 . Let us dénoté by P the restriction of this linear transformation 
to the plane l_i . Then the angles of interest to us are given by the formula | coscp\ = 
\P(e)\, where e are ail possible vectors in the plane Li of unit length. We restrict 
our attention to the case in which the linear transformation P is an isomorphism. 
The case in which this does not occur, that is, when the kernel of the transformation 
P is not equal to (0) and the image is not equal to l_ 2 , is dealt with similarly. 

Since P is an isomorphism, there is an inverse transformation P~ { : l _2 L\. 

Let us choose in the planes Li and L _2 orthonormal bases e \ , £2 and g\, g 2 - Let the 
vector e e L\ hâve unit length. We set / = P(e), and assuming that / = x\g\ + 
X 2 g 2 -> we shall obtain équations for the coordinates x\ and x 2 . Let us set 

p~ l (g\) + fie 2 , P~\g 2 ) = Y« 1 + Se 2 . 

Since f = P(e), it follows that 

e = P~ { (f) = x\P~ l (g x ) + x 2 P~ l (g 2 ) = (ax\ + yx 2 )e 1 + (fixi +Sx 2 )e 2 , 

and the condition \P~ { (f) \ = 1, which we shall write in the form \P~ [ (f)\ 2 = 1, 
reduces to the equality (ax\ + yx 2 ) 2 + (f$x\ + Sx 2 ) 2 = 1, that is, 

(a 2 + jS 2 )xf + 2 (a y + /3S)x 1 x 2 + (y 2 + S 2 )xj = 1. (7.37) 

Equation (7.37) with variables x\, x 2 defines a second-degree curve in the rect- 
angular coordinate System determined by the vectors g { and g 2 - This curve is 
bounded, since |/| < \e\ (/ is the orthogonal projection of the vector e ), and this 
implies that (f 2 ) < 1, that is, x 2 + x 2 < L As one learns in a course on analytic 
geometry, such a curve is an ellipse. In our case, it has its center of symmetry at the 
origin O, that is, it is unchanged by a change of variables x\ -> —x\, x 2 -> —x 2 
(see Fig. 7.8). 

It is known from analytic geometry that an ellipse has two distinguished points A 
and A', symmetric with respect to the origin, such that the length \ OA \ — \ OA'\ is 
greater than \OC\ for ail other points C of the ellipse. The segment \ OA\ — \ OA'\ 
is called the semimajor axis of the ellipse. Similarly, there exist points B and B' 
symmetric with respect to the origin such that the segment | O B\ — \OB'\ is shorter 
than every other segment \ OC\. The segment \OB \ = \OB'\ is called the semiminor 
axis of the ellipse. 
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Fig. 7.8 Ellipse described by 
équation (7.37) 



Let us recall that the length of an arbitrary line segment \OC\, where C is any 
point on the ellipse, gives us the value cos <p, where ç is the angle between a certain 
line contained in Li and the plane l_2. From this it follows that cos <p attains its 
maximum for one value of <p, while for some other value of (p it attains its minimum. 
Let us dénoté these angles by cp\ and (p 2 respectively. B y définition, 0 < <p\ 
y. It is these two angles that are called the angles between the planes Li and L2. 

The case that we hâve omitted, in which the transformation P has a nonnull 
kernel, reduces to the case in which the ellipse depicted in Fig. 7.8 shrinks to a line 
segment. 

It now remains for us to check that if both angles between the planes (Li , L2) 
are equal to the corresponding angles between the planes (Lj, L^), then the figures 
S = Li U L_2 and S f = L' { U L' 2 will be congruent, that is, there exists an orthogonal 
transformation VL taking the plane L/ into L^, i = 1 , 2 . 

Let cp\ and cp 2 be the angles between l_i and L2, equal, by hypothesis, to the angles 
between L\ and L ' 2 . Reasoning as previously (in the case of the angle between a line 
and a plane), we can find an orthogonal transformation that takes L2 to L' 2 . This 
implies that we may assume that L2 = L 2 . Let us dénoté this plane by L. Here, of 
course, the angles cp\ and (p 2 remain unchanged. Let cos^i < cos (^2 for the pair of 
planes Li and L. This implies that cos <p\ and cos <^2 are the lengths of the semiminor 
and semimajor axes of the ellipse that we considered above. This is also the case for 
the pair of planes L' { and L. By construction, this means that cos^q = |/j| = \f\\ 
and cos<p 2 = I/2I = |/?|, where the vectors fi g L are orthogonal projections of 
the vectors e\ g Li of length 1 . Reasoning similarly, we obtain the vectors f\ g L 
and e'j g Lj, i = 1 , 2 . 

Since |/j| = \f\ |, |/ 2 | = |/?|, and since by well-known properties of the el- 
lipse, its semimajor and semiminor axes are orthogonal, we can find an orthogonal 
transformation of the space M that takes f x to f\ and f 2 to f ' 2 , and having done so, 
assume that f \ — f\ and f 2 = f 2 - since an ellipse is defined by its semiaxes, 
it follows that the ellipses C 1 and C\ that are obtained in the plane L from the planes 
l_i and Lj simply coincide. Let us consider the orthogonal projections of the space 
M to the plane L. Let us dénoté by P its restriction to the plane Li, and by IP' its 
restriction to the plane L'j . 

We shall assume, as we did previously, that the transformations P : Lj — ► L and 
P' : L\ -> L are isomorphisms of the corresponding linear spaces, but it is not at ail 
necessary that they be isomorphisms of Euclidean spaces. Let us represent this with 
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arrows in a commutative diagram 


Ll 


v 


v 


L' 



L 



(7.38) 


and let us show that the transformations P and differ from each other by an 
isomorphism of Euclidean spaces Li and Lj . In other words, we claim that the trans- 
formation V = is an isomorphism of the Euclidean spaces Li and Lj . 

As the product of isomorphisms of linear spaces, the transformation V is also an 
isomorphism, that is, a bijective linear transformation. It remains for us to verify that 

V préserves the inner product. As noted above, to do this, it suffices to verify that 

V préserves the lengths of vectors. Let x be a vector in L. If x = 0, then the vector 

V(x) is equal to 0 by the linearity of V, and the assertion is obvious. If x ^ 0, then 
we set e — a~ [ x, where a — \x\, and then \e\ — 1. The vector !P{e) is contained 
in the ellipse C in the plane L. Since C — C' , it follows that P(e) — (e f ), where 

e f is some vector in the plane Lj and \e'\ — 1. From this we obtain the equality 

!P(e) = e\ that is, V(e) = e' and \e'\ — 1, which implies that | V(x)| — a — 
|x|, which is what we had to prove. 

We shall now consider a basis of the plane L consisting of vectors /j and f 2 ly- 
ing on the semimajor and semiminor axes of the ellipse C — C\ and augment it with 
vectors e\, e 2 , where P(ei) — f We thereby obtain four vectors e\, e 2 , f y, f 2 in 
the space l_i + L (it is easily verified that they are linearly independent). Similarly, 
in the space Lj + L, we shall construct four vectors e' 2 , f y, f 2 . We shall show 
that there exists an orthogonal transformation of the space M taking the first set of 
four vectors into the second. To do so, it suffices to prove that the inner products of 
the associated vectors (in the order in which we hâve written them) coincide. Here 
what is least trivial is the relationship (e \ , e' 2 ) = (e \ , e^), but it follows from the fact 
that e’ ï — V(ei), where V is an isomorphism of the Euclidean spaces Li and . The 
relationship (e\, f {) — (e\, f {) is a conséquence of the fact that f ^ is an orthog- 
onal projection, {e\, f {) — |/il 2 , and similarly, f \) — \f\\ 2 . The remaining 
relationships are even more obvious. 

Thus the figures S — Li U l _2 and S ' — Lj U L r 9 are congruent if and only if both 
angles between the planes l_i , L 2 and Lj , L 2 coincide. With the help of theorems 
to be proved in Sect. 7.5, it will be easy for the reader to investigate the case of a 
pair of subspaces l_i, l _2 C M of arbitrary dimension. In this case, the answer to the 
question whether two pairs of subspaces S = Li U l _2 and S' = L' { U L' 2 are congruent 
is determined by the agreement of two finite sets of numbers that can be interpreted 
as “angles” between the subspaces l_i , l _2 and Ej , l/ 2 . 
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Example 7.30 When the senior of the two authors of this textbook gave the course 
on which it is based (this was probably in 1952 or 1953) at Moscow State Uni- 
versity, he told his students about a question that had arisen in the work of A.N. 
Kolmogorov, A. A. Petrov, and N.V. Smirnov, the answer to which in one particular 
case had been obtained by A.I. Maltsev. This question was presented by the pro- 
fessor as an example of an unsolved problem that had been worked on by noted 
mathematicians yet could be formulated entirely in the language of linear algebra. 
At the next lecture, that is, a week later, one of the students in the class came up to 
him and said that he had found a solution to the problem. 3 

The question posed by A.N. Kolmogorov et al. was this: In a Euclidean space 
L of dimension n, we are given n nonnull mutually orthogonal vectors x \, . . . , x n , 
that is, (Xj , x j) = 0 for ail i ^ j, i, j = 1, . . . , n. For what values m < n does there 
exist an ra-dimensional subspace M c L such that the orthogonal projections of the 
vectors x\, ... ,x n to it ail hâve the same length? A.I. Maltsev showed that if ail 
the vectors x \ , . . . , x n hâve the same length, then there exists such a subspace M of 
each dimension m < n . 

The general case is approached as follows. Let us set |x/| = a, and assume that 
there exists an m-dimensional subspace M such that the orthogonal projections of ail 
vectors Xj to it hâve the same length a. Let us dénoté by the orthogonal mapping 
to the subspace M, so that 1 2 P ( jc / ) | = a. Let us set f t — a~ { Xi. Then the vectors 
/i , . . . , form an orthonormal basis of the space L. Conversely, let us select in L 
an orthonormal basis e \ , . . . , e n such that the vectors e \ , . . . , e m form a basis in M, 
that is, for the décomposition 

L=M©M- l , (7.39) 

we join the orthonormal basis e \ , . . . , e m of the subspace M to the orthonormal basis 
e m +\, . . . , e n of the subspace M 3 -. 

Let /,- = T!U iiki^k- Then we can interpret the matrix U — ( Uki ) as the ma- 
trix of the linear transformation VL, written in terms of the basis e\, ... ,e n , taking 
vectors e\, ... , e n to vectors f i . Since both sets of vectors e\, ... ,e n and 

fl, ... , f n are orthonormal bases, it follows that VL is an orthogonal transforma- 
tion, in particular, by formula (7.18), satisfying the relationship 


UU* = E. (7.40) 

From the décomposition (7.39) we see that every vector f t can be uniquely rep- 
resented in the form of a sum f t — U[ + Vj, where m / g M and v; e M 3 -. By défi- 
nition, the orthogonal projection of the vector /, onto the subspace M is equal to 
P (fi) = U(. By construction of the basis e \, . . . , e n , it follows that 

m 

TP (fi) — ^2 U ki e k- 

k= 1 


3 It was published as L.B. Nisnevich, V.I. Bryzgalov, “On a problem of n-dimensional geometry,” 
Uspekhi Mat. Nauk 8:4(56) (1953), 169-172. 
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By assumption, we hâve the equalities | c ‘P(/,-)| 2 = \3 > {a i 1 jc / ) | 2 = a 2 a i 2 , which 
in coordinates assume the form 

m 

E l 2-2 

u ki =a ot[ , i = n. 

k=\ 


If we sum these relationships for ail i = 1 , ... ,n and change the order of summation 
in the double sum, then taking into account the relationship (7.40) for the orthogonal 
matrix U, we obtain the equality 


n n m m n 



i = 1 / = 1 k=\ k= 1 i = \ 


(7.41) 


from which it follows that a can be expressed in ternis of a\ , . . . ,a n , and m by the 
formula 



(7.42) 


From this, in view of the equalities \tP(fi)\ z = \ 3 > (oi i 1 X/) | z = a z a i z , we ob- 
tain the expressions 


. 2.-2 



2 




By Theorem 7.10, we hâve \ tP(fi)\ < |//|, and since by construction, |/ z | = 1, we 
obtain the inequalities 


1 

< 1 , i = \, ... ,n, 

from which it follows that 

n 

û' 2 ^û' / “ 2 >m, i = l,...,n. (7.43) 

i=i 

Thus the inequalities (7.43) are necessary for the solvability of the problem. Let 
us show that they are also sufficient. 

Let us consider first the case m = 1. We observe that in this situation, the in- 
equalities (7.43) are automatically satisfied for an arbitrary collection of positive 
numbers ct \ , . . . , a n . Therefore, for an arbitrary System of mutually orthogonal vec- 
tors x\, ... ,x n in L, we must produce a line M c L such that the orthogonal projec- 
tions of ail these vectors onto it hâve the same length. For this, we shall take as such 
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a line M = (y) with the vectors 



where as before, a~ = (x/, x/). Since ) y G M and (x,- — j, y) = 0, it fol- 
lows that the orthogonal projection of the vector x/ onto the line M is equal to 



Ui , y) 
IjI 2 y ' 


Clearly, the length of each such projection 



K*;, jOI (of j • • • ) 2 




does not dépend on the index of the vector x/. Thus we hâve proved that for an 
arbitrary System of n nonnull mutually orthogonal vectors in an n-dimensional Eu- 
clidean space, there exists a line such that the orthogonal projections of ail vectors 
onto it hâve the same length. 

To facilitate understanding in what follows, we shall use the Symbol P(m,n) 
to dénoté the following assertion: If the lengths a\, ... ,a n of a System of mutu- 
ally orthogonal vectors xi, . . . ,x n in an n-dimensional Euclidean space L satisfy 
condition (7.43), then there exists an m-dimensional subspace McL such that the 
orthogonal projections ^(xi), . . . , !P(x n ) of the vectors xi , . . . , x n onto it hâve the 
same length a, expressed by the formula (7.42). Using this convention, we may say 
that we hâve proved the assertion P(l,n) for ail n > 1. 

Before passing to the case of arbitrary m, let us recast the problem in a more 
convenient form. Let Pi , . . . , f n be arbitrary numbers satisfying the following con- 
dition: 


Pi H h Pn — m, 0 < pi < 1, i = 1, . . . , n. 


(7.44) 


Let us dénoté by P'(m,n) the following assertion: In the Euclidean space L there 
exist an orthonormal basis g\, ...,g n and an m-dimensional subspace L'cL such 
that the orthogonal projections tP'(gi) of the basis vectors onto L' hâve length \[p>i, 
that is, 




i 


n. 


Lemma 7.31 The assertions P(m,n) and P' (m,n) with a suitable choie e of num- 
bers a \ , . . . , a n and fi \, . . . , p n are équivalent. 


P roof Let us first prove that the assertion P'(m,n) follows from the assertion 
P(m,n). Here we are given a collection of numbers /3i, . . . , p n satisfying the con- 
dition (7.44), and it is known that the assertion P(m, n) holds for arbitrary positive 
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numbers a \ , . . . , a n satisfying condition (7.43). For the numbers /3i , . . . , /3„ and ar- 

— 1/2 

bitrary orthonormal basis g\, , g n we define vectors x; = g i9 i = 1 , . . . , n. 

— 1/2 

It is clear that these vectors are mutually orthogonal, and furthermore, |x; | = /3 / 

Let us prove that the numbers satisfy the inequalities (7.43). Indeed, if 

we take into account the condition (7.44), we hâve 

n n 

a] «r 2 = ar l & =P7 Xm - m - 

i = 1 i = l 

The assertion P(m,n) says that in the space L there exists an m-dimensional 
subspace M such that the lengths of the orthogonal projections of the vectors x; 
onto it are equal to 



But then the lengths of the orthogonal projections of the vectors gj onto the same 
subspace M are equal to \P(gj)\ = \& (yffTiX i)\ — */]3j. 

Now let us prove that the assertion P' {m, n) yields P(m, n). Here we are given 
a collection of nonnull mutually orthogonal vectors x\, ... ,x n of length |x z | =oii, 
and moreover, the numbers ctj satisfy the inequalities (7.43). Let us set 




-i 


and verify that /3, satisfies conditions (7.44). The equality 4 | - p n = m clearly 

follows from the définition of the numbers /3, . From the inequalities (7.43) it follows 
that 

a f ~ ( ; «è a r 2 

\ i = 1 



and this implies that 



The assertion P'(m,n) says that there exist an orthonormal basis g\, . . , g n of 
the space L and an m-dimensional subspace L' C L such that the lengths of the 
orthogonal projections of the vectors g , onto it are equal to \tP'(gj)\ = \fPî- But 
then the orthogonal projections of the mutually orthogonal vectors ' g t onto 
the same subspace L will hâve the same length, namely 1 . 

To prove the assertion P (m, n ) for given vectors x\, ... ,x n , it now suffices to 
consider the linear transformation VL of the space L mapping the vectors gj to 
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U(gi) = f h where /, = a ( . 'x;. Since the bases g u . . . , g n and f x , . . . , /„ are 
orthonormal, it follows that K is an orthogonal transformation, and therefore, the 
orthogonal projections of the X/ onto the m-dimensional subspace M = < U(L / ) hâve 
the same length. Moreover, by what we hâve proved above, this length is equal to the 
number a determined by formula (7.42). This complétés the proof of the lemma. □ 


Thanks to the lemma, we may prove the assertion P' (m, n) instead of the asser- 
tion P(m,n). We shall do so by induction on m and n. We hâve already proved the 
base case of the induction (m = 1, n > 1). The inductive step will be divided into 
three parts: 

(1) From assertion P'(m, n ) for 2m < n + 1 we shall dérivé P'(m,n - h 1). 

(2) We shall prove that the assertion P\m, n) implies P'(n,m — n). 

(3) We shall prove that the assertion P'(m -h 1 , n) for ail n > m - h 1 is a conséquence 
of the assertion P'(m' ,n) for ail m' < m and n > m ' . 

Part P. From assertion P'(m, n ) for 2m <n - h 1, we dérivé P'(m,n+ 1). We shall 
consider the collection of positive numbers /3\ , ,p n , fi n +\ satisfying conditions 
(7.44) with n replaced by n + 1, with 2m < (n - h 1). Without loss of generality, we 
may assume that f3\ > f $2 > • • • > Pn+i- Since /?i + • • • + /3 /7 +i = m and n - 1- 1 > 
2m, it follows that f3 n + p n +\ S 1. Indeed, for example for odd n , the contrary 
assumption would give the inequality 


P\ + Pl > ‘ * * > fin + Aî+1 > 1 » 

^ V ^ 

(/7+l)/2 sums 


from which clearly follows /3i H h /3 w +i > (n + l)/2 > m, which contradicts the 

assumption that has been made. 

Let us consider the ( n -h l)-dimensional Euclidean space L and décomposé it as 
a direct sum L = (e) 0 where e e L is an arbitrary vector of length 1. By the 
induction hypothesis, the assertion P'(m, n) holds for numbers p \, . . . , f5 n -\ and 
P — p n + /$ n +i and the n-dimensional Euclidean space {e) 1 - . This implies that in 
the space (e) there exist an orthonormal basis g { , . . . , g n and an m- dimensional 
subspace l! such that the squares of the lengths of the orthogonal projections of the 
vectors g t onto L' are equal to 


P’igi) 


Al ^ 1 1 • • • 9 H 1 7 


P'(gn) 


— Pn + ftn + 1 


We shall dénoté by P : L L' the orthogonal projection of the space L onto 
L' (in this case, of course, tP(e) — 0), and we construct in L an orthonormal basis 
g ...,g n+ \ for which \J } (g i )\ 2 = f i for ail i = 1, ...,n+ 1. 

Let us set g , = g,- for i = 1 ,n- 2 and g n — ag n + be, g n+l — cg n + de, 

where the numbers a,b,c,d are chosen in such a way that the following conditions 
are satisfied: 


&n I ISw+ll 1 ’ (Sn ’ Sn+\ ) 



2 




(7.45) 
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Then the System of vectors . . . , g /?+1 proves the assertion P'(m,n + 1). 
The relationships (7.45) can be rewritten in the form 

a 2 + b 2 — c 2 + d 2 — 1 , ac-\- bd = 0, 

G 2 (fin + ^n+l) = A» C 2 (fi n + Aî+l) — fin+\- 

It is easily verified that these relationships will be satisfied if we set 


b — =bc, d — c/ = 


Pn 


fin T Aî+1 


C = 


Ai+1 


A? + Aî+i 


Before proceeding to part 2, let us make the following observation. 


Proposition 7.32 To prove the assertion P'(m , n),we may assume that fii < 1 for 
æ// i = 1 , ... ,n. 


Proof Let 1 = /?i = • • • = fik > fik+\ > ••• > fin > 0- We choose in the n- 
dimensional vector space L an arbitrary subspace L& of dimension k and consider 
the orthogonal décomposition L = ® Ljb . We note that 

1 > Pk+1 >••*> fin >0 and #t + i H | -fi n = m-k. 

Therefore, if the assertion P'(m — k,n — k) holds for the numbers fik+i, . . . , fi n , 
then in L jr , there exist a subspace of dimension m — k and an orthonormal basis 

g^ +1 , . . . , g n such that \tP(gi)\ 2 — fii for i = k + 1, . . . , n, where P : -> L' k is 

an orthogonal projection. 

We now set L = L& ® L' k and choose in an arbitrary orthonormal ba- 
sis g g k- Then if IP' : L L is the orthogonal projection, we hâve that 
\tP'(gi)\ 2 = 1 for i = 1, . . . , k and \^P'(gi)\ 2 = fit for i = k + 1, . . . , n. □ 


Part 2: Assertion P'(m,n) implies assertion P'(n,m — n). Let us consider n 
numbers p\ > • • • > p n satisfying condition (7.44) in which the number m is re- 
placed by n — m. We must construct an orthogonal projection fP r : L L of the 
ft-dimensional Euclidean space L onto the (m — n)-dimensional subspace L and 
an orthonormal basis g\,...,g n in L for which the conditions \3 ) '(g i )\ 2 — fii , 
i = l, ... ,n, are satisfied. By a previous observation, we may assume that ail fii are 
less than L Then the numbers fi- = 1 — fii satisfy conditions (7.44), and by assertion 
P'(m, n ), there exist an orthonormal projection 3* : L —> L of the space L onto the 
m-dimensional subspace L and an orthonormal basis g ± , . . . , g n for which the con- 
ditions \T > (gj)\ 2 = fi'j are satisfied. For the desired (m — n)-dimensional subspace 
we shall take L = L 1 and dénoté by 3 y/ the orthogonal projection onto L. Then for 
each i = 1 , . . . , n , the equalities 


Si =&(gi) + P\gi), 


1 = 


\gi\ 2 = 


ÿ(gi) 2 + p\gi) 2 =fs'i+ p\gi) 
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are satisfied, from which it follows that \tP'(gj)\ 2 = 1 — fi' = Pi. 

Part 3: Assertion P'(m + 1, ri) for ail n > m + 1 is a conséquence of P'(m ' , n) 
for ail m' < m and n > m' . By our assumption, the assertion P'(m,n) holds in 
particular for n = 2m + 1. By part 2, we may assert that P'(m + 1, 2m + 1) holds, 
and since 2(m + 1) < (2m + 1) + 1, then by virtue of part 1, we may conclude that 
P' (m + 1 , n) holds for ail n > 2m + 1 . It remains to prove the assertions P'(m + 1, n) 
for m + 2 < n < 2m. But these assertions follow from P'(n — (m + 1), n) by part 2. 
It is necessary only to verify that the inequalities \ <n — (m + 1) < m are satisfied, 
which follows directly from the assumption that m + 2 <n < 2m. 


7.5 Symmetric Transformations 

As we observed at the beginning of Sect. 7.1, for a Euclidean space L, there exists 
a natural isomorphism L L* that allows us to identify in this case the space L* 
with L. In particular, using the définition given in Sect. 3.7, we may define for an 
arbitrary basis e\ , . . . , e n of the space L the dual basis / j , . . . , f n of the space L by 
the condition (/,-, e- t ) — 1, (/,-, ef) = 0 for i ^ j . Thus an orthonormal basis is one 
that is its own dual. 

In the same way, we can assume that for an arbitrary linear transformation 
A : L — ► L, the dual transformation A* : L* — ► L* defined in Sect. 3.7 is a linear 
transformation of the Euclidean space L into itself and is determined by the condi- 
tion 

(<A*O0, y) = (*, My)) (7.46) 

for ail vectors x, y g L. By Theorem 3.81, the matrix of the linear transformation A 
in an arbitrary basis of the space L and the matrix of the dual transformation A * in 
the dual basis are transposes of each other. In particular, the matrices of the trans- 
formations A and in an arbitrary orthonormal basis are transposes of each other. 
This is in accord with the notation A* that we hâve chosen for the transpose matrix. 
It is easily verified also that conversely, if the matrices of transformations A and £ 
in some orthonormal basis are transposes of each other, then the transformations A 
and 33 are dual. 

As an example, let us consider the orthogonal transformation T(, for which 
by définition, the condition (T((x), V.(y)) = (x t y) is satisfied. By formula 
(7.46), we hâve the equality (V.(x) t V.(y)) = (x, c K* c K(y)), from which follows 
(x, VL*VL(y )) = (x, y). This implies that (x, 11*11^) — y) = 0 for ail vectors x, 
from which follows the equality c K* c K(y) = y for ail vectors y g L. In other words, 
the fact that VL* VL is equal to 8 , the identity transformation, is équivalent to the 
property of orthogonality of the transformation \i. In matrix form, this is the rela- 
tionship (7.18). 

Définition 7.33 A linear transformation A of a Euclidean space is called symmetric 
or self-dual if A* = A. 
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In other words, for a symmetric transformation A and arbitrary vectors x and y, 
the following condition must be satisfied: 


that is, the bilinear form <p(x, y) = (cA(x), y) is symmetric. As we hâve seen, from 
this it follows that in an arbitrary orthonormal basis, the matrix of the transformation 
A is symmetric. 

Symmetric linear transformations play a very large rôle in mathematics and its 


symmetric transformations of infinite-dimensional Hilbert space (see the note on 
p. 214) correspond to what are called observed physical quantities. We shall, how- 
ever, restrict our attention to finite-dimensional spaces. As we shall see in the sequel, 
even with this restriction, the theory of symmetric linear transformations has a great 
number of applications. 

The following theorem gives a basic property of symmetric linear transforma- 
tions of finite-dimensional Euclidean spaces. 

Theorem 7.34 Every symmetric linear transformation ofa real vector space has an 
eigenvector. 

In view of the very large number of applications of this theorem, we shall présent 
three proofs, based on different principles. 

P roof of Theorem 7.34 First proof. Let i be a symmetric linear transformation 
of a Euclidean space L. If dim L > 2, then by Theorem 4.22, it has a one- or two- 
dimensional invariant subspace L'. It is obvious that the restriction of the transforma- 
tion A to the invariant subspace L' is also a symmetric transformation. If dim L' = 1, 
then we hâve L' = ( e ), where e 0, and this implies that e is an eigenvector. Con- 
sequently, to prove the theorem, it suffices to show that a symmetric linear transfor- 
mation in the two-dimensional subspace L' has an eigenvector. Choosing in L' an 
orthonormal basis, we obtain for A a symmetric matrix in this basis: 


In order to find an eigenvector of the transformation A , we must find a real root of 
the polynomial \ A — tE\. This polynomial has the form 



(7.47) 


applications. Their most essential applications relate to quantum mechanics, where 



(a — t)(c — t) — b 2 — t 2 — (a + c)t + ac — b 2 


and has a real root if and only if its discriminant in nonnegative. But the discriminant 
of this quadratic trinomial is equal to 


(a + c ) 2 — 4 [ac — b 2 ) — {a — c) 2 + 4 b 2 > 0, 


and the proof is complété. 
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Second proof. The second proof is based on the complexification L c of the real 
vector space L. Following the construction presented in Sect. 4.3, we may extend 
the transformation A to the vectors of the space L c . By Theorem 4.18, the obtained 
transformation A c : L c —> L c will already hâve an eigenvector e e L c and eigen- 
value À g C, so that A c (e) = ke. 

We shall extend the inner product (x, y) from the space L to L c so that it dé- 
termines there a Hermitian form (see the définition on p. 210). It is clear that this 
can be accomplished in only one way: defining two vectors a\ = x\ + iy± and 
a 2 = xi + iy 2 of the space L c , we obtain the inner product according to the for- 
mula 


(ai, a 2 ) = (*i, x 2 ) + Oi, y 2 ) + i{(y i,xi) - (*i, J 2 ))- (7-48) 

The vérification of the fact that the inner product (a \, « 2 ) thus defined actually dé- 
termines in L c a Hermitian form is reduced to the vérification of sesquilinearity (in 
this case, it suffices to consider separately the product of a vector a 1 and a vector «2 
by a real number and by i) and the property of being Hermitian. Here ail calculations 
are completely trivial, and we shall omit them. 

An important new property of the inner product (a\ , « 2 ) that we hâve obtained is 
its positive definiteness, that is, like the scalar product (a, fl), it is real (this follows 
from the Hermitian property) and (a, a) > 0, a ^ 0 (this is a direct conséquence of 
formula (7.48), for x\ — x 2 , y\ — y 2 )- It is obvious that for the new inner product 
we also hâve an analogue of the relationship (7.47), that is, 


(<A c (ai), a 2 ) = (ci\, <A c (a 2 )); (7.49) 

in other words, the form cp(a i, « 2 ) = (<A c (fli), « 2 ) is Hermitian. Let us apply (7.49) 
to the vectors a\ — «2 = e. Then we obtain (ke, e) — (e,ke). Taking into ac- 
count the Hermitian property, we hâve the equalities (ke, e) — k(e,e) and (e, ke) — 
k(e, e), from which it follows that k(e,e) — k(e,e). Since (e, e) > 0, we dérivé 
from this that k = k, that is, the number k is real. Thus the characteristic polyno- 
mial | A c — t8\ of the transformation A c has a real root k. But a basis of the space 
L as a space over R is a basis of the space L c over C, and the matrix of the trans- 
formation in this basis coincides with the matrix of the transformation A. In 
other words, |cA c — t8\ = |A> — t8 1, which implies that the characteristic polyno- 
mial \A — t8\ of the transformation A has a real root k, and this implies that the 
transformation A : L — >• L has an eigenvector in the space L. 

Third proof. The third proof rests 011 certain facts from analysis, which we now 
introduce. We first observe that a Euclidean space can be naturally converted into a 
metric space by defining the distance r(x, y) between two vectors x and y by the 
relationship r(x, y) = |x — y|. Thus in the Euclidean space L we hâve the notions of 
convergence, limit, continuous functions, and closed and bounded sets; see p. xvii. 

The Bolzano-Weierstrciss theorem asserts that for an arbitrary closed and 
bounded set X in a finite-dimensional Euclidean space L and arbitrary continu- 
ous function <p(x) on X there exists a vector xq g X at which (p(x) assumes its 
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maximum value: that is, cp(x o) > cp(x) for ail x e X. This theorem is well known 
from real analysis in the case that the set X is an interval of the real line. Its proof in 
the general case is exactly the same and is usually presented somewhat later. Here 
we shall use the theorem without offering a proof. 

Let us apply the Bolzano-Weierstrass theorem to the set X consisting of ail vec- 
tors x of the space L such that \x\ = 1, that is, to the sphere of radius 1, and to the 
function cp(x) = (x, e>4>(x)). This function is continuous not only on X, but also on 
the entire space L. Indeed, it suffices to choose in the space L an arbitrary basis and 
to write down in it the inner product (x, <A(x)) as a quadratic form in the coordinates 
of the vector x. Of importance to us is solely the fact that as a resuit, we obtain a 
polynomial in the coordinates. After this, it suffices to use the well-known theorem 
that States that the sum and product of continuous functions are continuous. Then 
the question is reduced to a vérification of the fact that an arbitrary coordinate of the 
vector x is a continuous function of x, but this is completely obvious. 

Thus the function (x, eA(x)) assumes its maximum over the set X at some xo = e. 
Let us dénoté this value by X. Consequently, (x, <A(x)) < X for every x for which 
|x| = 1. For every nonnull vector y, we set x = y/\y\. Then |x| = 1, and apply ing 
to this vector the inequality above, we see that (y, <A(y)) < À (y, y) for ail y (this 
obviously holds as well for y = 0). 

Let us prove that the number X is an eigenvalue of the transformation A. To this 
end, let us write the condition that defines X in the form 

(y, My)) < Hy, y), *- = (e, Me)),\e\ = i, (7.50) 

for an arbitrary vector y G L. 

Let us apply (7.50) to the vector y = e -h sz, where both the scalar s and vector 
z G L are thus far arbitrary. Expanding the expressions (y, «A (y)) = (e + sz, A(e) + 
£eA(z)) and (y, y) = (e + sz, e + sz), we obtain the inequality 

(e, A(e)) + s(e, «A(z)) + s(z, <A(e)) + s 2 (A(z ), <A(z)) 

< X((e, e) + e(e, z ) + e(z, e) + £ 2 (z, z)). 

In view of the symmetry of the transformation A, on the basis of the properties of 
Euclidean spaces and recalling that (e, e) = 1, (e, A (e)) = X, after canceling the 
common term (e, A(e)) = X(e, e) on both sides of the above inequality, we obtain 

2s(e, A(z) - Xz) + £ 2 ((<A(z), A(z)) - A(z, z)) < 0. (7.51) 

Let us now note that every expression as + b s 2 in the case a / 0 assumes a pos- 
itive value for some s. For this it is necessary to choose a value |e| sufficiently 
small that a + b s has the same sign as a , and then to choose the appropriate sign 
for s. Thus the inequality (7.51) always leads to a contradiction except in the case 
(e, eA(z) - Xz) = 0. 

If for some vector z / 0 , we hâve <A(z) = Xz, then z is an eigenvector of the 
transformation A with eigenvalue À, which is what we wished to prove. But if 
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e>4)(z) — kz 7^ 0 for ail 0, then the kernel of the transformation — k8 is equal 
to (0). From Theorem 3.68 it follows that then the transformation A — k8 is an 
isomorphism, and its image is equal to ail of the space L. This implies that for ar- 
bitrary u e L, it is possible choose a vector z e L such that u — A{z) — kz. Then 
taking into account relationship ( e , A(z) — kz) — 0, we obtain that an arbitrary vec- 
tor u g L satisfies the equality ( e , u) — 0. But this is impossible at least for u — e, 
since \e\ — \. □ 

The further theory of symmetric transformations is constructed on the basis of 
some very simple considérations. 

Theorem 7.35 If a subspace L of a Euclidean space L is invariant xvith respect 
to the symmetric transformation A , then its orthogonal complément (LO -1 " is also 
invariant. 

Proof The resuit is a direct conséquence of the définitions. Let y be a vector in 
(L') -1 . Then (x, y) — 0 for ail x g L'. In view of the symmetry of the transformation 
A, we hâve the relationship 


(x,Myï) = {Mx),y), 

while taking into account the invariance of L' yields that A(x) e L. This implies 
that (x, eA(jO) = 0 for ail vectors x g L', that is, A(y) e (L')" 1 , and this complétés 
the proof of the theorem. □ 

Combining Theorems 7.34 and 7.35 yields a fundamental resuit in the theory of 
symmetric transformations. 

Theorem 7.36 For every symmetric transformation A of a Euclidean space L of 
finite dimension , there exists an orthonormal basis of this space consisting ofeigen- 
vectors of the transformation A. 

Proof The proof is by induction on the dimension of the space L. Indeed, by Theo- 
rem 7.34, the transformation A has at least one eigenvector e. Let us set 

L= (e) © ( e )- L , 

where (^) _L has dimension n — 1, and by Theorem 7.35, is invariant with respect 
to A. By the induction hypothesis, in the space (^)- L there exists a required basis. If 
we add the vector e to this basis, we obtain the desired basis in L. □ 

Let us discuss this resuit. For a symmetric transformation A, we hâve an or- 
thonormal basis e \, . . . , e n consisting of eigenvectors. But to what extent is such a 
basis uniquely determined? Suppose the vector et has the associated eigenvalue À/ . 
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Then in our basis, the transformation A has matrix 



/ A.1 0 0\ 

0 À 2 • • • 0 

y 0 0 • • • X n J 


(7.52) 


But as we saw in Sect. 4.1, the eigenvalues of a linear transformation A coincide 
with the roots of the characteristic polynomial 


n 

\A - 1S | = | A - tE\ = ]~[(À, - t). 

i= 1 

Thus the eigenvalues k \, . . . , X n of the transformation A are uniquely determined. 
Suppose that the distinct values among them are X \, . . . , Xk. If we assemble ail the 
vectors of the constructed orthonormal basis that correspond to one and the same 
eigenvalue Xj (from the set Ài , . . . , Xk of distinct eigenvalues) and consider the sub- 
space spanned by them, then we obviously obtain the eigensubspace L^ ; (see the 
définition on p. 138). We then hâve the orthogonal décomposition 

L = L À1 © • • • © U*, where L À ,. _L L À; . for ail i ^ (7.53) 

The restriction of A to the eigensubspace L^. gives a transformation XjS, and in this 
subspace, every orthonormal basis consists of eigenvectors (with eigenvalue Xj). 

Thus we see that a symmetric transformation A uniquely defines only the eigen- 
subspace L^. , while in each of them, one can choose an orthonormal basis as one 
likes. On combining these bases, we obtain an arbitrary basis of the space L satisfy- 
ing the conditions of Theorem 7.36. 

Let us note that every eigenvector of the transformation A lies in one of the sub- 
spaces Lxj . If two eigenvectors x and y are associated with different eigenvalues 
Xj X j, then they lie in different subspaces L and L^., and in view of the orthog- 

onality of the décomposition (7.53), they must be orthogonal. We thus obtain the 
following resuit. 

Theorem 7.37 The eigenvectors of a symmetric transformation corresponding to 
different eigenvalues are orthogonal. 

We note that this theorem can also be easily proved by direct calculation. 

P roof of Theorem 7.37 Let x and y be eigenvectors of a symmetric transformation 
A corresponding to distinct eigenvalues X / and Xj. Let us substitute the expressions 
e>4>(jt) = XiX and cA(y) = Xjy into the equality (<A(x), y) = (x, eA(y)). From this 
we obtain (À/ — Xj)(x, y) = 0, and since Xj Xj, we hâve (x, y) = 0. □ 

Theorem 7.36 is often formulated conveniently as a theorem about quadratic 
forms using Theorem 6.3 from Sect. 6.1 and the possibility of identifying the space 
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L* with L if the space L is equipped with an inner product. Indeed, Theorem 6.3 
shows that every bilinear form (p on a Euclidean space L can be represented in the 
form 

<P(x,y) = (x^OO), (7.54) 

where A is the linear transformation of the space L to L* uniquely defined by the bi- 
linear form (p\ that is, if we make the identification of L* with L, it is a transformation 
of the space L into itself. 

It is obvious that the symmetry of the transformation A coincides with the sym- 
metry of the bilinear form cp. Therefore, the bijection between symmetric bilin- 
ear forms and linear transformations established above yields the same correspon- 
dence between quadratic forms and symmetric linear transformations of a Euclidean 
space L. Moreover, in view of relationship (7.54), to the symmetric transformation 
there corresponds the quadratic form 

i Hx) = (*, jV*)), 

and every quadratic form fi(x) has a unique représentation in this form. 

If in some basis e \, . . . , e n , the transformation A has a diagonal matrix (7.52), 

then for the vector x = x\e\ H V x n e n , the quadratic form i//(x) has in this basis 

the canonical form 

fi(x) = k[X^ + • • • + h n x n' (7.55) 

Thus Theorem 7.36 is équivalent to the following. 

Theorem 7.38 For any quadratic form in a finit e-dimensional Euclidean space , 
there exists an orthonormal basis in which it has the canonical form (7.55). 

Theorem 7.38 is sometimes conveniently formulated as a theorem about arbitrary 
vector spaces. 

Theorem 7.39 For txvo quadratic forms in a finite-dimensional vector space , one of 
which is positive definite, there exists a basis ( not necessarily orthonormal) in which 
they both hâve canonical form (7.55). 

In this case, we say that in a suitable basis, these quadratic forms are reduced to 
a sum of squares (even if there are négative coefficients À/ in formula (7.55)). 

P roof of Theorem 7.39 Let f\(x) and ^(jc) be two such quadratic forms, one of 
which, let it be i/q(x), is positive definite. By Theorem 6.10, there exists, in the 
vector space L in question, a basis in which the form fi\{x) bas the canonical form 
(7.55). Since by assumption, the quadratic form x//\ (x) is positive definite, it follows 
that in formula (7.55), ail the numbers À/ are positive, and therefore, there exists a 
basis e \ , . . . , e n of the space L in which f\(x) is brought into the form 


\f{x) — x\ H h x„. 


(7.56) 
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Let us consider as the scalar product (x, y) in the space L the symmetric bilinear 
form <p(x, y), associated by Theorem 6.6 with the quadratic form \j/\ (x). We thereby 
convert L into a Euclidean space. 

As can be seen from formulas (6.14) and (7.56), the basis e \ , . . . , e n for this inner 
product is orthonormal. Then by Theorem 7.38, there exists an orthonormal basis 
e' v . . . , e' n of the space L in which the form t/t 2 (x) has canonical form (7.55). But 
since the basis e \ , . . . , e' u is orthonormal with respect to the inner product that we 
defined with the help of the quadratic form i/s i (x), then in this basis, i/t (■*) as before 
takes the form (7.56), and that complétés the proof of the theorem. □ 

Remark 7.40 It is obvious that Theorem 7.39 remains true if in its formulation we 
replace the condition of positive definiteness of one of the forms by the condition 
of négative definiteness. Indeed, if i/r(x) is a négative definite quadratic form, then 
the form — ^r(x) is positive definite, and both of these assume canonical form in one 
and the same basis. 

Without the assumption of positive (or négative) definiteness of one of the 
quadratic forms, Theorem 7.39 is no longer true. To prove this, let us dérivé one 
necessary (but not sufficient) condition for two quadratic forms t/t (x) and i/r 2 (x) to 
be simultaneously reduced to a sum of squares. Let Ai and A 2 be their matrices in 
some basis. If the quadratic forms \jr\ (x) and T/f 2 (x) are simultaneously reducible to 
sums of squares, then in some other basis, their matrices A\ and A 2 will be diagonal, 
that is, 


4 = 

0 0 • 
0 g 1 • 

50 • 
— 

A ' _ 

, a 2 — 

(P\ 0 o\ 

0 /3 2 • • • 0 


0 ••• a n ) 


• • • • 
(0 0 ••• p J 


Then the polynomial | Aj t + A ' 2 | is equal to Y\l= 1 (°7 t + Pi ) , that is, it can be factored 
as a product of linear factors 0 ?/ 1 + fy . But by formula (6.10) for replacing the matrix 
of a bilinear form through a change of basis, the matrices Ai, Aj and A 2 , A 2 are 
related by 

a; =C*AiC, A' 2 = C*A 2 C, 
where C is some nonsingular matrix, that is, |C| 7 ^ 0. Therefore, 


A\t + A' 2 


C*(A\t + Ai)C\ 


C*\\Ait + A 2 \\C\, 


from which taking into account the equality |C*| = |C|, we obtain the relationship 


\A x t + A 2 \ = \C\~ 2 A j t + A' 2 


from which it follows that the polynomial \A[t + A 2 | can also be factored into 
linear factors. Thus for two quadratic forms yjr\(x) and 1 ^ 2 (*) with matrices Ai and 
A 2 to be simultaneously reduced each to a sum of squares, it is necessary that the 
polynomial \A\t + A 2 | be factorable into real linear factors. 


7.5 Symmetric Transformations 


253 


Now for n — 2 we set fa (x) = x\ — x 2 and faix) = xix 2 . These quadratic forms 
are neither positive definite nor négative definite. Their matrices hâve the form 





•> 


and it is obvious that the polynomial | A\t + A 2 | — —(t 2 + 1) cannot be factored into 
real linear factors. This implies that the quadratic forms faix) and faix) cannot 
simultaneously be reduced to sums of squares. 

The question of reducing pairs of quadratic forms with complex coefficients to 
sums of squares (with the help of a complex linear transformation) is examined in 
detail, for instance, in the book The Theory of Matrices, by F.R. Gantmacher. See 
the references section. 


Remark 7.41 The last proof of Theorem 7.34 that we gave makes it possible to in- 
terpret the largest eigenvalue À of a symmetric transformation A as the maximum 
of the quadratic form (x, A(x)) on the sphere \x\ = 1. Let À/ be the other eigen- 
values, so that (x, A>(x)) = X\x 2 + • • • + X n x 2 . Then X is the greatest among the 
À/. Indeed, let us assume that the eigenvalues are numbered in descending order: 
X\ > A . 2 > • • • > X n . Then 


X\x 2 + h X n x 2 < X\ ( x \ H b x^), 

and the maximum value of the form (x, cA(jc)) on the sphere |x| = 1 is equal to X\ 
(it is attained at the vector with coordinates x\ — 1, X 2 = • • • = x n = 0). This implies 
that Ài = À. 

There is an analogous characteristic for the other eigenvalues À/ as well, namely 
the Courant-Fischer theorem , which we shall présent without proof. Let us consider 
ail possible vector subspaces L' C L of dimension k. We restrict the quadratic form 
(x, e>4>(jt)) to the subspace L and examine its values at the intersection of L' with the 
unit sphere, that is, the set of ail vectors x g L that satisfy |x| = 1. By the Bolzano- 
Weierstrass theorem, the restriction of the form (x, <A(x)) to L assumes a maximum 
value X' at some point of the sphere, which, of course dépends on the subspace L. 
The Courant-Fischer theorem asserts that the smallest number thus obtained (as the 
subspace L ranges over ail subspaces of dimension k) is equal to the eigenvalue 
7n— Àr+l • 

Remark 7.42 Eigenvectors are connected with the question of finding maxima and 
minima. Let /(x \, ... ,x n ) be a real-valued différentiable function of n real vari- 
ables. A point at which ail the dérivatives of the function / with respect to the 
variables (xi, . . . , x n ), that is, the dérivatives in ail directions from this point, are 
equal to zéro is called a critical point of the function. It is proved in real analysis 
that with some natural constraints, this condition is necessary (but not sufficient) for 
the function / to assume a maximum or minimum value at the point in question. 
Let us consider a quadratic form f (x) = (x , A(x)) on the unit sphere |x| = 1. It is 
not difficult to show that for an arbitrary point on this sphere, ail points sufficiently 
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Fig. 7.9 An ellipsoid 



close to it can be written in some System of coordinates such that our function / 
can be viewed as a function of these coordinates. Then the critical points of the 
function (x, A>(x)) are exactly those points of the sphere that are eigenvectors of 
the symmetric transformation A. 


Example 7.43 Let an ellipsoid be given in three-dimensional space with coordinates 
x, y, z by the équation 



2 2 
y z 

b 2 c 2 


= 1 . 


(7.57) 


The expression on the left-hand side of (7.57) can be written in the form — 
(x, A(x)), where 


x = (x,y,z), 


A(x) = 


x y z\ 
a 2 ' b 2 ' c 2 )' 


Let us assume that 0 < a < b < c. Then the maximum value that the quadratic form 
ir(x) takes on the sphere |x| = lisÀ = l/tf 2 .Itis attained on the vectors (dbl, 0, 0). 
if mx)\ < À for |x| = 1, then for an arbitrary vector y 0, setting x = y /\y\ 9 we 
obtain \i/r(y)\ < À|y| 2 . For the vector y — 0, this inequality is obvious. Therefore, 
it holds in general for ail y. For \i/s(y)\ = 1, it then follows that |y| 2 > 1/À. This 
implies that the shortest vector y satisfying équation (7.57) is the vector (±a, 0, 0). 
The line segments beginning at the point (0, 0, 0) and ending at the points (d -a, 0, 0) 
are called the semiminor axes of the ellipsoid (sometimes, this same term dénotés 
their length). Similarly, the smallest value that the quadratic form i/f(x) attains on 
the sphere \x \ — 1 is equal to l/c 2 . It attains this value at vectors (0, 0, ±1) on the 
unit sphere. Line segments corresponding to vectors (0, 0, ±c) are called semima- 
jor axes of the ellipsoid. A vector (0, ±b, 0) corresponds to a critical point of the 
quadratic form i//"(x) that is neither a maximum nor a minimum. Such a point is 
called a minimax, that is, as it moves from this point in one direction, the func- 
tion yjr{x) will increase, while in moving in another direction it will decrease (see 
Fig. 7.9). The line segments corresponding to the vectors (0, ±b, 0) are called the 
médian semiaxes of the ellipsoid. 


Everything presented thus far in this chapter (with the exception of Sect. 7.3 
on the orientation of a real Euclidean space) can be transferred Verbatim to complex 
Euclidean spaces if the inner product is defined using the positive definite Hermitian 
form (p(x, y). The condition of positive definiteness means that for the associated 
quadratic Hermitian form i/r(x) = <p(x, x), the inequality i/s(x) > 0 is satisfied for 
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ail x ^ 0 . If we dénoté, as before, the inner product by (x, y), the last condition can 
be written in the form (x, x) > 0 for ail x ^ 0. 

The dual transformation cA*, as previously, is defined by condition (7.46). But 
now, the matrix of the transformation A* in an orthonormal basis is obtained from 
the matrix of the transformation A not simply by taking the transpose, but by taking 
the complex conjugate of the transpose. The analogue of a symmetric transforma- 
tion is defined as a transformation A whose associated bilinear form (x, eA(y)) is 
Hermitian. 

It is a fundamental fact that in quantum mechanics, one deals with complex space. 
We can formulate what was stated earlier in the following form: observed physical 
quantifies correspond to Hermitian forms in infinite-dimensional complex Hilbert 
space. 

The theory of Hermitian transformations in the finite-dimensional case is con- 
structed even more simply than the theory of symmetric transformations in real 
spaces, since there is no need to prove analogues of Theorem 7.34: we know already 
that an arbitrary linear transformation of a complex vector space has an eigenvector. 
From the définition of being Hermitian, it follows that the eigenvalues of a Her- 
mitian transformation are real. The theorems proved in this section are valid for 
Hermitian forms (with the same proofs). 

In the complex case, a transformation V. preserving the inner product is called 
unitary. The reasoning carried out in Sect. 7.2 shows that for a unitary transforma- 
tion VL , there exists an orthonormal basis consisting of eigenvectors, and ail eigen- 
values of the transformation VL are complex numbers of modulus 1 . 


7.6 Applications to Mechanics and Geometry* 

We shall présent two examples from two different areas — mechanics and geome- 
try — in which the theorems of the previous section play a key rôle. Since these 
questions will be taken up in other courses, we shall allow ourselves to be brief in 
both the définitions and the proofs. 

Example 7.44 Let us consider the motion of a mechanical System in a small neigh- 
borhood of its equilibrium position. One says that such a System possesses n degrees 
offreedom if in some région, its State is determined by n so-called generalized co- 
ordinates q i , . . . , q n , which we shall consider the coordinates of a vector q in some 
coordinate System, and where we will take the origin 0 to be the equilibrium posi- 
tion of our System. The motion of the System détermines the dependence of a vector 
q on time t. We shall assume that the equilibrium position under investigation is 
determined by a strict local minimum of its potential energy 77 . If this value is 
equal to c, and the potential energy is a function 77 (g i, ... ,q n ) in the generalized 
coordinates (it is assumed that it does not dépend on time), then this implies that 
77(0, . . . , 0) = c and 77 (q\ , . . . , q n ) > c for ail remaining values q\ , . . . , q n close to 
zéro. From the fact that a critical point of the function 77 corresponds to the min- 
imum value, we may conclude that at the point 0, ail partial dérivatives 377/3 qi 
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become zéro. Therefore, for an expansion of the function n(q \ , . . . , q n ) as a sériés 
in powers of the variables q \, . . . , q n at the point 0, the linear terms will be equal 
to zéro, and we obtain the expression Tl(q \ , . . . , q n ) = c + Yl j=i + • * * , 

where Z?/ y are certain constants, and the ellipsis indicates terms of degree greater 
than 2. Since we are considering motions not far from the point 0, we can disregard 
those values. It is in this approximation that we shall consider this problem. That is, 
we set 

n 

n(q U . • • , q, i) = c + ^2 bijViQj- 

67=1 

Since fl(q\ , . . . , q n ) > c for ail values q\ , . . . , q n not equal to zéro, the quadratic 
form YH j = 1 bijqiqj will be positive definite. 

Kinetic energy T is a quadratic form in so-called generalized velocities dq\/dt, 

. . . , dq n /dt, which are also denoted by q\, . . . , q n , that is, 

n 

T = ^2 a ij4i4j' ( 7 - 58 ) 

67=1 

where cijj = aji are functions of q (we assume that they do not dépend on time t). 
Considering as we did for potential energy only those values qi close to zéro, we 
may replace ail the functions ay in (7.58) by constants ajj( 0), which is what we 
shall now assume. Kinetic energy is always positive except in the case that ail qt are 
equal to 0, and therefore, the quadratic form (7.58) is positive definite. 

Motion in a broad class of mechanical Systems (so-called natural Systems) is 
described by a rather complex System of differential équations — second-order La- 
grange équations : 


d fdT\ dT 



dû) 




(7.59) 


Application of Theorem 7.39 makes it possible to reduce these équations in the 
given situation to much simpler ones. To this end, let us find a coordinate System 
in which the quadratic form YTi j = î a îj x i x j can be brought into the form YH=\ x f> 
and the quadratic form j=\ bij x i x j i nt0 the form Yl*i = 1 ^i x f- Then in this case, 
the form j=\ bij x i x j is positive definite, which implies that ail À/ are positive. 
In this System of coordinates (we shall again dénoté them by q \ , . . . , q n ), the System 
of équations (7.59) is decomposed into the independent équations 



(7.60) 


which hâve the solutions qi — C[ cos sfkit + dj sin>/X7 1, where C[ and d\ are arbi- 
trary constants. This shows that “small oscillations” are periodic in each coordinate 
qi . Since they are bounded, it follows that our equilibrium position 0 is stable. If 
we were to examine the State of equilibrium at a point that was a critical point of 
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potential energy 77 but not a strict minimum, then in the équations (7.60) we would 
not be able to guarantee that ail the À/ were positive. Then for those i for which 
À/ < 0, we would obtain the solutions qi — c/ cosh ^/—Xit + d[ sinh which 

can grow without bound with the growth of î . Just as for À/ = 0, we would obtain 
an unbounded solution qi — c { + djt. 

Strictly speaking, we hâve done only the following altogether: we hâve replaced 
the given conditions of our problem with conditions close to them, with the resuit 
that the problem became much simpler. Such a procedure is usual in the theory of 
differential équations, where it is proved that solutions to a simplified System of 
équations are in a certain sense similar to the solutions of the initial System. And 
moreover, the degree of this déviation can be estimated as a function of the values 
of the terms that we hâve ignored. This estimation takes place in a finite interval of 
time whose length also dépends on the value of the ignored terms. This justifies the 
simplifications that we hâve made. 

A beautiful example, which played an important rôle historically, is given by 
latéral oscillations of a string of beads. 4 

Suppose we hâve a weightless and ideally flexible thread fixed at the ends. On it 
are securely fastened n beads with masses mi,...,m n , and suppose they divide the 
thread into segments of lengths /o, l \ , . . . , l n . We shall assume that in its initial State, 
the thread lies along the x-axis, and we shall dénoté by y \ , . . . , y n the displacements 
of the beads along the y-axis. Then the kinetic energy of this System has the form 



i= 1 


Assuming the tension of the thread to be constant (as we may because the displace- 
ments are small) and equal to cr, we obtain for the potential energy the expression 
77 = g Al, where Al — Y^l= o Ali is the change in length of the entire thread, and 
AU is the change in length of the portion of the thread corresponding to // . Then we 
know the Ali i n terms of the /, : 


Ali =y/lf + Cw+1 - yù 1 - li , i=0,...,n, 

where yo = y n + 1 = 0. Expanding this expression as a sum in y /+ i — y/, we obtain 
quadratic terms jp(yi + 1 — y i) 2 , and we may set 

n i 

n ^ y<) 2 ^ >’o = y n +\ = o. 


4 This example is taken from Gantmacher and Krein’s book Oscillation Matrices and Kernels and 
Small Vibrations of Mechanical Systems, Moscow 1950, English translation, AMS Chelsea Pub- 
lishing, 2002. 
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Thus in this case, the problem is reduced to simultaneously expressing two quadratic 
forms in the variables y \ , . . . , y n as sums of squares: 




yo = y n +i = o. 


But if the masses of ail the beads are equal and they divide the thread into equal 
segments, that is, m/ = m and /, = l/(n + 1), i = 1, . . . , n, then ail the formulas can 
be written in a more explicit form. In this case, we are speaking about the simulta- 
neous représentation as the sum of squares of two forms: 




o(n + 1) 

7 



n 


E 

/= o 


yw + 1 


yo = y, 7+1 = o. 


Therefore, we must use an orthogonal transformation (preserving the form YTi = i yf) 
to express as a sum of squares the form Y^= o J/E'+i with matrix 

/O 1 0 ••• 0 0\ 

10 1 0 0 



0 0 1 0 1 
\0 0 ••• 0 10 / 


It would hâve been possible to take the standard route: find the eigenvalues 
Ài, . . . , X n as roots of the déterminant \ A — tE\ and eigenvectors y from the System 
of équations 

Ay = Xy, (7.61) 

where X = À/ and y is the column of unknowns yu ... ,y n - But it is simpler to 
use équations (7.61) directly. They give a System of n équations in the unknowns 


y 2 = 2Xy \ , y\ + y 3 = 2 ky 2 , 
yn — 2 T yn —2 Xy n —\, yn — l —2Xy n , 

which can be written in the form 

yk - 1 + yk+i = 2Xy k , k = 1, . . . , n, (7.62) 

where we set yo = y n +i =0- The System of équations (7.62) is called a récurrence 
relation , whereby each value y k + \ is expressed in terms of the two preceding values: 
y k and y k -\ ■ Thus if we know two adjacent values, then we can use relationship 
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(7.62) to construct ail the y*. The condition yo — y«+i = 0 is called a boundary 
condition. 

Let us note that for X = ±1, the équation (7.62) with boundary condition yo = 
y w +i = 0 has only the null solution: yo = • • • = y n + 1 = 0. Indeed, for X = 1, we 
obtain 


yi = 2_vi , 3^3 = 3vi , y,, = ny\ , y«+i = (n + l)yi, 

from which by y n +\ = 0 it follows that y \ = 0, and ail y^ are equal to 0. Similarly, 
for À = — 1 , we obtain 


y 2 = —2yi , y 3 = 3yi , y 4 = -4yi , 
y n = (-l)' ,_1 nyi, y n +\ = (-!)"(« + l)yi. 


from which by y n+ 1 = 0 it follows as well that yi = 0, and again ail the y* are equal 
to zéro. Thus for À = d=l, the System of équations (7.61) has as its only solution 
the vector y — 0, which by définition, cannot be an eigenvector. In other words, this 
implies that the numbers ±1 are not eigenvalues of the matrix A. 

There is a lovely formula for solving équation (7.62) with boundary condition 
y 0 = y n+ 1 = 0. Let us dénoté by a and p the roots of the quadratic équation 
z 2 — 2 Xz + 1 = 0. By the above reasoning, À / ±1, and therefore, the numbers 
a and P are distinct and cannot equal ±1. Direct substitution shows that then for 
arbitrary A and B , the sequence y^ = Aa k + Bp k satisfies the relationship (7.62). 
The coefficients A and B taken to satisfy yo = 0, y i are given. The following y^, as 
we hâve seen, are determined by the relationship (7.62), and this implies that again 
they are given by our formula. The conditions yo = 0, y i fixed give B — — A and 
A(a — P) = yi , whence A = y\/(a — p). Thus we obtain the expression 



(7.63) 


We now use the condition y n +\ — 0, which gives û ' ,z+1 = p u+{ . Moreover, since 
a and p are roots of the polynomial z 2 — 2 Xz + 1 , we hâve a P = 1 , whence P — a ~ 1 , 
which implies that a 2in+[) — 1. From this (taking into account that a ^ ±1), we 
obtain 


where i is the imaginary unit, and the number j assumes the values 1 , ,n. Again 
using the équation z 2 — 2Xz +1=0, whose roots are a and p , we obtain n distinct 
values for À: 


À j — cos 




9 


n. 


since j = n + 2, . . . , 2n + 1 give the same values Xj. These are precisely the eigen- 
values of the matrix A. For the eigenvector y j of the associated eigenvalue Xj, we 
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obtain by formula (7.63) its coordinates y\j, y n j in the form 

( 7 zkj \ 

— — j-|, k = l, . . . ,n. 

These formulas were derived by d’Alembert and Daniel Bernoulli. Passing to the 
limit as n oo, Lagrange derived from these the law of vibrations of a uniform 
string. 

Example 7.45 Let us consider in an M-dimensional real Euclidean space L the subset 
X given by the équation 

F(x \, . . . , x n ) = 0 (7.64) 

in some coordinate System. Such a subset X is called a hypersurface and consists of 
ail vectors x — (x \, . . . , x n ) of the Euclidean space L whose coordinates satisfy the 
équation 5 (7.64). Using the change-of-coordinates formula (3.36), we see that the 
property of the subset X c L being a hypersurface does not dépend on the choice 
of coordinates, that is, on the choice of the basis of L. Then if we assume that the 
beginning of every vector is located at a single fixed point, then every vector x = 
(x \ , . . . , x n ) can be identified with its endpoint, a point of the given space. In order 
to conform to more customary terminology, as we continue with this example, we 
shall call the vectors x of which the hypersurface X consists its points. 

We shall assume that F(0) = 0 and that the function F(x ,x n ) is différen- 
tiable in each of its arguments as many times as necessary. It is easily verified that 
this condition also does not dépend on the choice of basis. Let us assume in addi- 
tion that 0 is not a critical point of the hypersurface X , that is, that not ail partial 
dérivatives 3F(0)/3 jc; are equal to zéro. In other words, if we introduce the vector 
gradF = (dF/dx \, . . . , dF/dx n ), called the gradient of the function F, then this 
implies that grad F(0) 7^ 0. 

We shall be interested in local properties of the hypersurface X , that is, prop- 
erties associated with points close to 0. With the assumptions that we hâve made, 
the implicit function theorem , known from analysis, shows that near 0, the coordi- 
nates x \ , . . . , x n of each point of the hypersurface X can be represented as a func- 
tion of n — 1 arguments mi, ... , u n -\, and furthermore, for each point, the values 
mi, . . . , u n -\ are uniquely determined. It is possible to choose as mi , . . . , u n -\ some 
n — 1 of the coordinates x \ , . . . , x n , after determining the remaining coordinate Xk 
from équation (7.64), for which must be satisfied only the condition J^(0) 7^ 0 for 
the given k, which holds because of the assumption gradF(O) 7^ 0. The functions 
that détermine the dependence of the coordinates x \ , . . . , x n of a point of the hyper- 
plane X on the arguments mi, . . . , u n -\ are différentiable at ail arguments as many 
times as the original function F (x \, ... ,x n ). 


5 The more customary point of view, when the hypersurface (for example, a curve or surface) con- 
sists of points, requires the considération of an n-dimensional space consisting of points (otherwise 
affine space), which will be introduced in the following chapter. 
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The hyperplane defined by the équation 



is called the tangent space or tangent hyperplane to the hypersurface X at the point 
0 and is denoted by TqX. In the case that the basis of the Euclidean space L is 
orthonormal, this équation can also be written in the form (grad F(0), x) = 0. As a 
subspace of the Euclidean space L, the tangent space TqX is also a Euclidean space. 

The set of vectors depending on the parameter t taking values on some interval 
of the real line, that is, x(t) — (jti(f), . . . , x n (t)), is called a smooth curve if ail 
functions x- t (t) are différentiable a sufficient number of times and if for every value 
of the parameter t, not ail the dérivatives dxi/dt are equal to zéro. In analogy to 
what was said above about hypersurfaces, we may visualize the curve as consisting 
of points A(t ), where each A(t) is the endpoint of some vector x(t), while ail the 
vectors x(t) begin at a certain fixed point O. In what follows, we shall refer to the 
vectors x that constitute the curve as its points. 

We say that a curve y passes through the point xo if x(to) = *o f° r some value 
of the parameter ^o- It is clear that here we may always assume that to = 0. Indeed, 
let us consider a different curve x(t) = (x\ (t), . . . , x n (t)), where the functions jc/ (t) 
are equal to x, (t + to). This can also be written in the form x(t) = x(t), where we 
hâve introduced a new parameter r related to the old one by r = t — to. 

Generally speaking, for a curve we may make an arbitrary change of parameter 
by the formula t — i/z (r ) , where the function t/t defines a continuously différentiable 
bijective mapping of one interval to another. Under such a change, a curve, consid- 
ered as a set of points (or vectors), will remain the same. From this it follows that one 
and the same curve can be written in a variety of ways using various parameters . 6 

We now introduce the vector ^ = (^ 7 -, • . • , yjy)- Suppose the curve y passes 
through the point 0 for t = 0. Then the vector p — ^y(O) is called a tangent vector 
to the curve y at the point 0. It dépends, of course, on the choice of parameter t 
defining the curve. Under a change of parameter t = t/t(t), we hâve 


dx dx dt 

dr dt d r 


dx 

dt 




(7.65) 


and the tangent vector p is multiplied by a constant equal to the value of the dériva- 
tive 0). Using this fact, it is possible to arrange things so that I ^7 (01 = 1 for ail t 
close to 0. Such a parameter is said to be natural. The condition that the curve x(t) 
belong to the hyperplane (7.64) gives the equality F(x(t)) = 0, which is satisfied 
for ail t. Differentiating this relationship with respect to t, we obtain that the vector 
p lies in the space TqX. And conversely, an arbitrary vector contained in TqX can 


6 For example, the circle of radius 1 with center at the origin with Cartesian coordinates x, y can be 
defined not only by the formula x = cos t, y = sin t, but also by the formula x = cos r , y = — sin r 
(with the replacement t = — r), or by the formula x = sin r, y = cos r (replacement t = ^ — r). 
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be represented in the form ^(0) for some curve x(t). This curve, of course, is not 
uniquely determined. Curves whose tangent vectors p are proportional are said to 
be tangent at the point 0 . 

Let us dénoté by n a unit vector orthogonal to the tangent space T$X. There are 
two such vectors, n and — w, and we shall choose one of them. For example, we may 
set 


n — 


grad F 
grad F 


( 0 ). 


2 

We define the vector as ^(^ 7 ) and set 


(7.66) 



(7.67) 


Proposition 7.46 The value Q dépends only on the vector p\ namely , it is a 
quadratic form in its coordinates. 


P roof It suffices to verify this assertion by substituting in (7.67) for the vector n, 
any vector proportional to it, for example, grad F(0). Since by assumption, the curve 
x(t) is contained in the hyperplane (7.64), it follows that F(x\(t ), . . . , x n (t)) = 0. 
Differentiating this equality twice with respect to t, we obtain 


n 


E 


dF dxi 
dx[ dt 




d 2 F dxi dxj 
dxj 3 x j dt dt 



3 F d 2 Xj 
3 x/ dt 2 



Setting here t — 0, we see that 



(0), grad F (0) I = - ^ 



d 2 F 
3 Xi 3 Xj 


(0 )piPj, 


where p = (p \, . . . , p n ). This proves the assertion. 



The form Q(p) is called the second quadratic form of the hypersurface. The 
form (p 2 ) is called the first quadratic form when TqX is taken as a subspace of a 
Euclidean space L. We observe that the second quadratic form requires the sélec- 
tion of one of two unit vectors (n or —n) orthogonal to TqX. This is frequently 
interpreted as the sélection of one side of the hypersurface in a neighborhood of the 
point 0 . 

The first and second quadratic forms give us the possibility to obtain an expres- 
sion for the curvature of certain curves x(t) lying in the hypersurface X. Let us 
suppose that a curve is the intersection of a plane L' containing the point 0 and the 
hypersurface X (even if only in an arbitrarily small neighborhood of the point 0). 
Such a curve is called a plane section of the hypersurface. If we define the curve 
x(t) in such a way that Ms a natural parameter, then its curvature at the point 0 is 
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the number 




We assume that k ^ 0 and set 




The vector m has length 1 by définition. It is said to be normal to the curve x(t) at 
the point 0. If the curve x(t) is a plane section of the hypersurface, then x(t) lies in 
the plane L' (for ail sufficiently small t), and consequently, the vector 

dx x(t + h) — x(t) 

— = lim 

dt /z-»o h 


also lies in the plane L'. Therefore, this holds as well for the vector d 2 x /dt 2 , which 
implies that it holds as well for the normal m . If the curve y is defined in terms of 
the natural parameter t, then 


2 

dx / dx dx 

dt \ dt ’ dt 

Differentiating this equality with respect to t, we obtain that the vectors d 2 x/dt 2 
and dx /dt are orthogonal. Hence the normal m to the curve y is orthogonal to an 
arbitrary tangent vector (for arbitrary définition of the curve y in the form x(t) with 
natural parameter t), and the vector m is defined uniquely up to sign. It is obvious 
that L f — {m, p), where p is an arbitrary tangent vector. 

By définition (7.67) of the second quadratic form Q and taking into account the 
equality \m\ — \n\ — 1 , we obtain the expression 



Q(p ) = (km, n) — k(m, n) — kcos(p, 


(7.68) 


where (p is the angle between the vectors m and n . The expression k cos <p is denoted 
by k and is called the normal curvature of the hypersurface X in the direction p. 
We recall that here n dénotés the chosen unit vector orthogonal to the tangent space 
TqX, and m is the normal to the curve to which the vector p is tangent. An analo- 
gous formula for an arbitrary parametric définition of the curve x(t) (where t is not 
necessarily a natural parameter) also uses the first quadratic form. Namely, if r is 
another parameter, while t is a natural parameter, then by formula (7.65), now in- 
stead of the vector p, we obtain p' — pi//' ( 0). Since Q is a quadratic form, it follows 
that Q(pis'( 0)) = t/t' (O) 2 Q(p), and instead of formula (7.68), we now obtain 


Q(P) 
( p 2 ) 


kcoscp. 


(7.69) 
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Here the first quadratic form (p 2 ) is already involved as well as the second quadratic 
form Q(p ), but now (7.69), in contrast to (7.68), holds for an arbitrary choice of 
parameter t on the curve y . 

The point of the term normal curvature given above is the following. The section 
of the hypersurface X by the plane L' is said to be normal if n g 17. The vector n 
defined by formula (7.66) is orthogonal to the tangent plane TqX. But in the plane L' 
there is also the vector p tangent to the curve y , and the normal vector m orthogonal 
to it. Thus in the case of a normal section n = ±m, this means that in formula (7.68), 
the angle cp is equal to 0 or n . Conversely, from the equality | cos<^| = 1, it follows 
that n g L' . Thus in the case of a normal section, the normal curvature k differs from 
k only by the factor db 1 and is defined by the relationship 

r Q(P) 



Since L' = {m, p), it follows that ail normal sections correspond to straight fines in 
the plane L' . For each fine, there exists a unique normal section containing this fine. 
In other words, we “rotate” the plane L' about the vector m , considering ail obtained 
planes {m, p), where p is a vector in the tangent hyperplane 7o X. Thus ail normal 
sections of the hypersurface X are obtained. 

We shall now employ Theorem 7.38. In our case, it gives an orthonormal basis 
e \ , . . . , e n -\ in the tangent hyperplane TqX (viewed as a subspace of the Euclidean 
space L) in which the quadratic form Q(p) is brought into canonical form. In other 

words, for the vector p = u\e\ + h u n -\e n -\, the second quadratic form takes 

the form Q(p) = X\u 2 + h X n -\ u 2 _ { . Since the basis e\, ... , e n -\ is orthonor- 

mal, we hâve in this case 


ut (p,ei) 

= = COS (Di , 

\Pi\ \Pi I 


(7.70) 


where (pi is the angle between the vectors p and et. From this we obtain for the 
normal curvature k of the normal section y , the formula 



(7.71) 


where p is an arbitrary tangent vector to the curve y at the point 0. Relationships 
(7.70) and (7.71) are called Euler' s formula. The numbers X\ are called principal 
curvatures of the hypersurface X at the point 0. 

In the case n — 3, the hypersurface (7.64) is an ordinary surface and has two prin- 
cipal curvatures X\ and À 2 . Taking into account the fact that cos 2 ç\ -h cos 2 <p 2 = 1, 
Euler’ s formula takes the form 

9 9 9 

k — X i cos (p\ + À 2 cos (p 2 — (X\ — X 2 ) cos (p 1 + À 2 . (7.72) 

Suppose X\ > X 2 . Then from (7.72), it is clear that the normal curvature k as- 
sumes a maximum (equal to X[) for cos 2 <pi = 1 and a minimum (equal to À 2 ) for 
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Fig. 7.10 Elliptic (a) and hyperbolic (b) points 

cos 2 (p\ = 0. This assertion is called the extremal property of the principal curva- 
tures of the surface. If Ài and À 2 hâve the same sign (À 1 À 2 > 0), then as can be 
seen from (7.72), an arbitrary normal section of a surface at a given point 0 has 
its curvature of the same sign, and therefore, ail normal sections hâve convexity in 
the same direction, and near the point 0 , the surface lies on one side of its tangent 
plane; see Fig. 7.10(a). Such points are called elliptic. If X\ and À 2 hâve differ- 
ent signs (À 1 À 2 < 0), then as can be seen from formula (7.72), there exist normal 
sections with opposite directions of convexity, and at points near 0, the surface is lo- 
cated on different sides of its tangent plane; see Fig. 7.10(b). Such points are called 
hyperbolic J 

From ail this discussion, it is évident that the product of principal curvatures 
k — À 1 À 2 characterizes some important properties of a surface (called “internai gé- 
ométrie properties” of the surface). This product is called the Gaussian or total 
curvature of the surface. 


7.7 Pseudo-Euclidean Spaces 

Many of the theorems proved in the previous sections of this chapter remain valid 
if in the définition of Euclidean space we forgo the requirement of positive definite- 
ness of the quadratic form ( x 2 ) or replace it with something weaker. Without this 
condition, the inner product (x, y) does not differ at ail from an arbitrary symmetric 
bilinear form. As Theorem 6.6 shows, it is uniquely defined by the quadratic form 

O 2 )- 

We thus obtain a theory that fully coincides with the theory of quadratic 
forms that we presented in Chap. 6. The fundamental theorem (on bringing a 
quadratic form into canonical form) consists in the existence of an orthonormal 
basis e\, . . . , e n , that is, a basis for which (ei , e j) = 0 for ail i ^ j . Then for the 
vector x\e\ + • • • + x n e u , the quadratic form (x 2 ) is equal to X\x 2 + • • • + À„x 2 . 


7 Examples of surfaces consisting entirely of elliptic points are ellipsoids, hyperboloids of two 
sheets, and elliptic paraboloids, while surfaces consisting entirely of hyperbolic points include 
hyperboloids of one sheet and hyperbolic paraboloids. 
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Moreover, this is true for vector spaces and bilinear forms over an arbitrary field K 
of characteristic different from 2. The concept of an isomorphism of spaces makes 
sense also in this case; as previously, it is necessary to require that the scalar product 
(x, y) be preserved. 

The theory of such spaces (defined up to isomorphism) with a bilinear or 
quadratic form is of great interest (for example, in the case K = Q, the field of 
rational numbers). But here we are interested in real spaces. In this case, formula 
(6.28) and Theorem 6.17 (law of inertia) show that up to isomorphism, a space is 
uniquely defined by its rank and the index of inertia of the associated quadratic form. 

We shall further restrict attention to an examination of real vector spaces with a 
nonsingular symmetric bilinear form (x, y). Let us recall that the nonsingularity of 
a bilinear form implies that its rank (that is, the rank of its matrix in an arbitrary 
basis of the space) is equal to dim L. In other words, this means that its radical is 
equal to (0); that is, if the vector x is such that (x, y) = 0 for ail vectors y € L, then 
x = 0 (see Sect. 6.2). For a Euclidean space, this condition follows automatically 
from property (4) of the définition (it suffices to set there y = x). 

Formula (6.28) shows that with these conditions, there exists a basis e\, ... ,e n 
of the space L for which 

(et , e j) = 0 for i^j, (e?) = ±l. 

Such a basis is called, as it was previously, orthonormal. In it, the form (x 2 ) can be 
written in the form 


(x 2 ) = x 2 l +---+x 2 - x 2 +l x 2 , 

and the number s is called the index of inertia of both the quadratic form (x 2 ) and 
the pseudo-Euclidean space L. 

A new difficulty appears that was not présent for Euclidean spaces if the quadratic 
form (x 2 ) is neither positive nor négative definite, that is, if its index of inertia s is 
positive but less than n. In this case, the restriction of the bilinear form (x, y) to the 
subspace L'cL can turn out to be singular, even if the original bilinear form (x, y) 
in L was nonsingular. For example, it is clear that in L, there exists a vector x^O 
for which (x 2 ) = 0 , and then the restriction of (x, y) to a one-dimensional subspace 
(x) is singular (identically equal to zéro). 

Thus let us consider a vector space L with a nonsingular symmetric bilinear form 
(x, y) defined on it. In this case, we shall use many concepts and much of the nota- 
tion used for Euclidean spaces earlier. Hence, vectors x and y are called orthogonal 
if (x, y) = 0. Subspaces L\ and l _2 are called orthogonal if (x, y) = 0 for ail vectors 
x g Li and y g l_ 2 , and we express this by writing Li _L l_ 2 . The orthogonal complé- 
ment of the subspace L'cL with respect to the bilinear form (x, y) is denoted by 
(L')- 1 . However, there is an important différence from the case of Euclidean spaces, 
in connection with which it will be useful to give the following définition. 

Définition 7.47 A subspace L' C L is said to be nondegenerate if the bilinear form 
obtained by restricting the form (x, y) to L' is nonsingular. In the contrary case, L' 
is said to be degenerate. 
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By Theorem 6.9, in the case of a nondegenerate subspace L' we hâve the orthog- 
onal décomposition 

L=L , ®(L , )- L . (7.73) 

In the case of a Euclidean space, as we hâve seen, every subspace L' is nondegen- 
erate, and the décomposition (7.73) holds without any additional conditions. As the 
following example will show, in a pseudo-Euclidean space, the condition of nonde- 
generacy of a subspace L' for the décomposition (7.73) is in fact essential. 

Example 7.48 Let us consider a three-dimensional space L with a symmetric bilin- 
ear form defined in some chosen basis by the formula 

(x, y) = Xiyi + X 2 yi - X’iy-i, 

where the Xj are the coordinates of the vector x, and the y/ are the coordinates 
of the vector y. Let L' = (e), where the vector e has coordinates (0, 1, 1). Then 
as is easily verified, ( e , e) = 0 , and therefore, the restriction of the form ( x , y) to 
L' is identically equal to zéro. This implies that the subspace L' is degenerate. Its 
orthogonal complément (L / ) J ~ is two-dimensional and consists of ail vectors z G L 
with coordinates (z\, Z 2 , zi) for which Z 2 — z 3 . Consequently, L' c (L')* 1 , and the 
intersection L' Pi (L') 1 - = L' contains nonnull vectors. This implies that the sum L' + 
(L ')" 1 is not a direct sum. Furthermore, it is obvious that L' + (L')- 1 ^ L. 

It follows from the nonsingularity of a bilinear form (x, y) that the déterminant 
of its matrix (in an arbitrary basis) is different from zéro. If this matrix is written in 
the basis e \ , . . . , e n , then its déterminant is equal to 


(*2,*t) 

(e n 5 ^ 1 ) 


(e\,e2) 

(e 2 ,e 2 ) 

(e n ,e 2 ) 


(ei,e n ) 

(^ 2 > &n) 



(7.74) 


and just as in the case of a Euclidean space, we shall call this its Gram détermi- 
nant of the basis e\, . . . , e n . Of course, this déterminant dépends on the choice of 
basis, but its sign does not dépend 011 the basis. Indeed, if A and A' are matrices 
of our bilinear form in two different bases, then they are related by the equality 
A' = C* AC , where C is a nonsingular transition matrix, from which it follows that 
\A'\ — \A\ ■ \C\ 2 . Thus the sign of the Gram déterminant is the same for ail bases. 

As noted above, for a nondegenerate subspace L' c L, we hâve the décomposition 
(7.73), which yields the equality 


dimL = dimL/ + dim(L / )“ L . (7.75) 

But equality (7.75) holds as well for every subspace L' C L, although as we saw in 
Example 7.48, the décomposition (7.73) may already not hold in the general case. 
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Indeed, by Theorem 6.3, we can write an arbitrary bilinear form (x, y ) in the 
space L in the form (x, y) = (x, A (y)), where A : L -> L* is some linear transfor- 
mation. From the nonsingularity of the bilinear form (x, y) follows the nonsingular- 
ity of the transformation A. In other words, the transformation A is an isomorphism, 
that is, its kernel is equal to (0), and in particular, for an arbitrary subspace L' C L, 
we hâve the equality dim e A(L / ) = dimL'. On the other hand, we can write the or- 
thogonal complément (L')" 1 in the form (A>(L')) fl , using the notion of the annihilator 
introduced in Sect. 3.7. On the basis of what we hâve said above and formula (3.54) 
for the annihilator, we hâve the relationship 

dim(e>4>(L')) fl = dimL — dim e A(L / ) = dimL — dimL', 

that is, dim(L')- 1 - = dimL — dimL'. We note that this argument holds for vector 
spaces L defined not only over the real numbers, but over any field. 

The spaces that we hâve examined are defined (up to isomorphism) by the index 
of inertia s, which can take values from 0 to n. By what we hâve said above, the sign 
of the Gram déterminant of an arbitrary basis is equal to (— l) n ~ s . It is obvious that 
if we replace the inner product (x, y) in the space L by — (x, y), we shall preserve ail 
of its essential properties, but the index of inertia s will be replaced by n — s, whence 
in what follows, we shall assume that n/2 < s < n. The case s — n corresponds 
to a Euclidean space. There exists, however, a phenomenon whose explanation is 
at présent not completely clear; the most interesting questions in mathematics and 
physics were until now connected with two types of spaces: those in which the index 
of inertia s is equal to n and those for which s — n — 1 . The theory of Euclidean 
spaces {s — n) has been up till now the topic of this chapter. In the remaining part, 
we shall consider the other case: s — n — 1. In the sequel, we shall call such spaces 
pseudo-Euclidean spaces (although sometimes, this term is used when (x, y) is an 
arbitrary nonsingular symmetric bilinear form neither positive nor négative definite, 
that is, with index of inertia s 0, n). 

Thus a pseudo-Euclidean space of dimension n is a vector space L equipped with 
a symmetric bilinear form (x, y) such that in some basis e \, . . . , e n , the quadratic 
form (x 2 ) takes the form 

x 2 H +x 2 _j — x 2 . (7.76) 

As in the case of a Euclidean space, we shall, as we did previously, call such bases 
orthonormal. 

The best-known application of pseudo-Euclidean spaces is related to the spécial 
theory of relativity . According to an idea put forward by Minkowski, in this theory, 
one considers a four-dimensional space whose vectors are called space-time events 
(we mentioned this earlier, on p. 86). They hâve coordinates (x, y,z, t), and the 
space is equipped with a quadratic form x 2 + y 2 + z 2 — t 2 (here the speed of light 
is assumed to be 1). The pseudo-Euclidean space thus obtained is called Minkowski 
space. By analogy with the physical sense of these concepts in Minkowski space, in 
an arbitrary pseudo-Euclidean space, a vector x is said to be spacelike if (x 2 ) > 0, 
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Fig. 7.11 A pseudo- 
Euclidean plane 



/, 


while such a vector is said to be timelike if (x 2 ) <0, and lightlike, or isotropie , if 
(je 2 ) = O . 8 


Example 7.49 Let us consider the simplest case of a pseudo-Euclidean space L with 
dim L = 2 and index of inertia s = 1. By the general theory, in this space there exists 
an orthonormal basis, in this case the basis e\ , e 2 , for which 


(ej) = l, («!) = - 1, (ei,* 2 ) = 0, (7.77) 

sy 'J 'J 

and the scalar square of the vector x = x\e\ + X 2 e 2 is equal to (x ) = xf — x 2 . 
However, it is easier to write the formulas connected with the space L in the basis 
consisting of lightlike vectors f 1 , f 2 , after setting 



ei +e 2 



(7.78) 


Then (f 2 ) = (f 2 ) — 0, (/j, f 2 ) — \ , and the scalar square of the vector x — 
x\ f 1 + X2/2 is equal to (x 2 ) = x\X2- The lightlike vectors are located on the co- 
ordinate axes; see Fig. 7.11. The timelike vectors comprise the second and fourth 
quadrants, and the spacelike vectors make up the first and third quadrants. 


Définition 7.50 The set V c L consisting of ail lightlike vectors of a pseudo- 
Euclidean space is called the light cône (or isotropie cône). 


That we call the set V a cône suggests that if it contains some vector e , then it 
contains the entire straight line (e), which follows at once from the définition. The 
set of timelike vectors is called the interior of the cône V, while the set of spacelike 
vectors makes up its exterior. In the space from Example 7.49, the light cône V is 
the union of two straight fines (f^) and (/ 2 ). A more visual représentation of the 
light cône is given by the following example. 


8 We remark that this terminology differs from what is generally used: Our “spacelike” vectors are 
usually called “timelike,” and conversely. The différence is explained by the condition s = n — 1 
that we hâve assumed. In the conventional définition of Minkowski space, one usually considers 
the quadratic form —x 2 — y 2 — z 2 + t 2 , with index of inertia 5=1, and we need to multiply it by 
— 1 in order that the condition s > n/2 be satisfied. 
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Fig. 7.12 The light cône 



Example 7.51 We consider the pseudo-Euclidean space L with dim L = 3 and index 
of inertia s — 2. With the sélection of an orthonormal basis e\ , £ 2 , ^3 such that 

(<?i) = (^ 2 ) = 1, (ef) = -l, (ei,ej) = 0 for ail i^j, 

the light cône V is defined by the équation x 2 + x% — x 2 = 0. This is an ordinary 
right circulai' cône in three-dimensional space, familiar from a course in analytic 
geometry; see Fig. 7.12. 

We now return to the general case of a pseudo-Euclidean space L of dimension n 
and consider the light cône V in L in greater detail. First of ail, let us verify that it is 
“completely circular.” By this we mean the following. 

Lemma 7.52 Although the cône V contains along with every vector x the entire 
line ( x ), it contains no two-dimensional subspace. 

Proof Let us assume that V contains a two-dimensional subspace {x, y). We choose 
a vector e eL such that ( e 2 ) = — 1. Then the line (e) is a nondegenerate subspace of 
L, and we can use the décomposition (7.73): 

L=(e)®(e) ± . (7.79) 

From the law of inertia it follows that (e) 1 - is a Euclidean space. Let us apply the 
décomposition (7.79) to our vectors x, y e V . We obtain 

x = ote + u, y — l 3e + v, (7.80) 

where u and v are vectors in the Euclidean space (^) _L , while a and p are some 
scalars. 

The conditions (x 2 ) = 0 and (y 2 ) = 0 can be written as a 2 = ( u 2 ) and fi 2 — ( v 2 ). 
Using the same reasoning for the vector j t + y = (a + fi)e + u + v, which by the 
assumption {x, y) C V is also contained in E, we obtain the equality 

(a + P) 2 — (u -b v, u + v) = (u 2 ) + 2 (u, v ) + (v 2 ) = a 2 + 2 (u, v ) + p 2 . 

Canceling the terms or and p 2 on the left- and right-hand sides of the equality, we 
obtain that ap — (u,v), that is, ( u , v) 2 — or p 2 — ( u 2 ) • ( v 2 ). Thus for the vectors 


7.7 Pseudo-Euclidean Spaces 


271 


u and v in the Euclidean space (e) 2 -, the Cauchy-Schwarz inequality reduces to 
an equality, from which it follows that u and v are proportional (see p. 218). Let 
v = Xu. Then the vector y — X x — (f — Xa)e is also lightlike. Since (e 2 ) — —1, it 
follows that f — Xa. But then from the relationship (7.80), it follows that y = Xx, 
and this contradicts the assumption dim(x, y) = 2. □ 

Let us select an arbitrary timelike vector e e L. Then in the orthogonal complé- 
ment (^)- L of the line (e), the bilinear form (x, y) détermines a positive definite 
quadratic form. This implies that (e) 1 - Pi V — (0), and the hyperplane (e) 1 - divides 
the set V \ 0 into two parts, V+ and V_, consisting of vectors x e V such that in 
each part, the condition ( e , x) > 0 or ( e , x) < 0 is respectively satisfied. We shall 
call these sets V+ and V _ pôles of the light cône V . In Fig. 7.12, the plane (e\, ef) 
divides V into “upper” and “lower” pôles V + and V_ for the vector e — e^. 

The partition V \ 0 = V+ U V- that we hâve constructed rested on the choice of 
some timelike vector e , and ostensibly, it must dépend on it (for example, a change 
in the vector e to —e interchanges the pôles V+ and VL). We shall now show that 
the décomposition V \ 0 = V+ U V - , without taking into account how we designate 
each pôle, does not dépend on the choice of vector e , that is, it is a property of 
the pseudo-Euclidean space itself. To do so, we shall require the following, almost 
obvious, assertion. 

Lemma 7.53 Let IL be a subspace of the pseudo-Euclidean space L of dimension 
dimlL > 2. Then the following statements are équivalent : 

(1) L ' is a pseudo-Euclidean space. 

(2) IL contains a timelike vector. 

(3) IL contains two linearly independent lightlike vectors. 

Proof If L' is a pseudo-Euclidean space, then statements (2) and (3) obviously fol- 
low from the définition of a pseudo-Euclidean space. 

Let us show that statement (2) implies statement (1). Suppose L' contains a time- 
like vector e. That is, ( e 2 ) < 0, whence the subspace (e) is nondegenerate, and 
therefore, we hâve the décomposition (7.79), and moreover, as follows from the 
law of inertia, the subspace ( e ) _L is a Euclidean space. If the subspace IL were de- 
generate, then there would exist a nonnull vector u e L' such that (m,x) = 0 for 
ail x e IL, and in particular, for vectors e and u. The condition (u, e) = 0 implies 
that the vector u is contained in (e) 2 -, while the condition (u, u) = 0 implies that 
the vector u is lightlike. But this is impossible, since the subspace (e) 1 - is a Eu- 
clidean space and cannot contain lightlike vectors. This contradiction shows that the 
subspace L' is nondegenerate, and therefore, it exhibits the décomposition (7.73). 
Taking into account the law of inertia, it follows from this that the subspace L' is a 
pseudo-Euclidean space. 

Let us show that statement (3) implies statement (1). Suppose the subspace L' 
contains linearly independent lightlike vectors f { and / 2 . We shall show that the 
plane TI — (/j, f 2 ) contains a timelike vector e. Then obviously, e is contained 
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Fig. 7.13 The plane 77 in a 
three-dimensional 
pseudo-Euclidean space 





in L', and by what was proved above, the subspace L/ is a pseudo-Euclidean space. 
Every vector e e FI can be represented in the form e = a f { + fi f 2 - From this, we 
obtain ( e 2 ) = 2a/3(/ 1 , / 2 ). We note that (/j, / 2 ) 7 ^ 0, since in the contrary case, 
for each vector e e 77, the equality (e 2 ) = 0 would be satisfied, implying that the 
plane FI lies completely in the light cône V, which contradicts Lemma 7.52. Thus 
(/ 1 , / 2 ) 7 ^ 0 , and choosing coordinates a and P such that the sign of their product 
is opposite to the sign of (/j , / 2 ), we obtain the vector e , for which (e 2 ) < 0 . □ 


Example 7.54 Let us consider the three-dimensional pseudo-Euclidean space L 
from Example 7.5 1 and a plane FI in L. The property of a plane FI being a Euclidean 
space, a pseudo-Euclidean space, or degenerate is clearly illustrated in Fig. 7.13. 

In Fig. 7.13(a), the plane 77 intersects the light cône E in two Unes, correspond- 
ing to two linearly independent lightlike vectors. Clearly, this is équivalent to the 
condition that 77 also intersects the interior of the light cône, which consists of 
timelike vectors, and therefore is a pseudo-Euclidean plane. In Fig. 7.13(c), it is 
shown that the plane 77 intersects E only in its vertex, that is, FI D V — (0). This 
implies that the plane 77 is a Euclidean space, since every nonnull vector e e FI lies 
outside the cône E, that is, ( e 2 ) > 0 . 

Finally, in Fig. 7.13(b) is shown the intermediate variant: the plane FI intersects 
the cône E in a single line, that is, it is tangent to it. Since the plane 77 contains 
lightlike vectors (lying on this line), it follows that it cannot be a Euclidean space, 
and since it does not contain timelike vectors, it follows by Lemma 7.53 that it 
cannot be a pseudo-Euclidean space. This implies that 77 is degenerate. 

This is not difficult to verify in another way if we write down the matrix of the 
restriction of the inner product to the plane 77. Suppose that in the orthonormal basis 
e\ , 02 > £3 from Example 7.49, this plane is defined by the équation x^ = ax\ -b fix 2 - 
Then the vectors g l — e\ -b ote^ and g 2 — e 2 + Pe^, form a basis of FI in which 


2 

the restriction of the inner product has matrix ( 1 01 ~ a ^ 0 ) with déterminant A = 
(1 — o' 2 )(l — p 2 ) — (ap) 2 . On the other hand, the assumption of tangency of the 

'l 'l 

plane FI and cône E amounts to the discriminant of the quadratic form xf -b x 2 — 
(ax 1 -b Px 2Ÿ in the variables x\ and X2 being equal to zéro. It is easily verified that 
this discriminant is equal to — A, and this implies that it is zéro precisely when the 
déterminant of this matrix is zéro. 
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Theorem 7.55 The partition of the light cône V into two pôles V+ and V_ does 
not dépend on the choice of timelike vector e. In particular ; the linearly independent 
lightlike vectors x and y lie in one pôle if and only if (x , y) < 0. 

P roof Let us assume that for some choice of timelike vector e , the lightlike vectors 
x and y lie in one pôle of the light cône V, and let us show that then, for any choice 
e, they will always belong to the same pôle. The case that the vectors x and y are 
proportional, that is, y — Àx, is obvious. Indeed, since (e) 1 - fl V — (0), it follows 
that (e, x) 0, and this implies that the vectors x and y belong to one pôle if and 
only if À > 0, independent of the choice of the vector e. 

Now let us consider the case that x and y are linearly independent. Then 
(x, y) 0, since otherwise, the entire plane (x, y) would be contained in the light 
cône V, which by Lemma 7.52, is impossible. Let us prove that regardless of what 
timelike vector e we hâve chosen for the partition V \ 0 = V+ U V_, the vectors 
x, y g V \ 0 belong to one pôle if and only if (x, y) < 0. Let us note that this ques- 
tion, strictly speaking, relates not to the entire space L, but only to the subspace 
(e, x, y), whose dimension, by the assumptions we hâve made, is equal to 2 or 3, 
depending on whether the vector e does or does not lie in the plane (x, y). 

Let us consider first the case dim(e, x, y) = 2, that is, e e (x, y). Then let us set 
e — ux- h f>y. Consequently, ( e , x) = fi(x, y) and ( e , y) = or(x, y), since x, y eV. 
B y définition, vectors x and y are in the same pôle if and only if (e, x)(e, y) > 0. 
But since ( e,x)(e,y ) = afi(x,y) 2 , this condition is équivalent to the inequality 
a fi > 0. The vector e is timelike, and therefore, (e 2 ) < 0, and in view of the equality 
(e 2 ) = 2 afi(x, y), we obtain that the condition a fi > 0 is équivalent to (x, y) < 0. 

Let us now consider the case that dim(£, x, y) =3. The space (e, x, y) contains 
the timelike vector e. Consequently, by Lemma 7.53, it is a pseudo-Euclidean space, 
and its subspace (x, y) is nondegenerate, since (x, y) 0 and (x 2 ) = (y 2 ) = 0. 
Thus here the décomposition (7.73) takes the form 

(é?,x,y) = (x, y) © (h), (7.81) 

where the space (h) = (x^y) 1 - is one-dimensional. On the left-hand side of the 
décomposition (7.81) stands a three-dimensional pseudo-Euclidean space, and the 
space (x,y) is a two-dimensional pseudo-Euclidean space; therefore, by the law 
of inertia, the space (h) is a Euclidean space. Thus for the vector e , we hâve the 
représentation 


e = ax + fi y + y h, (h,x) = 0, (h, y) — 0. 

From this follows the equality 


(e,x) = p (x,y), (e,y) = a(x,y), (e 2 ) = 2 a/3(x, y) + y 2 (h 2 ). 

Taking into account the fact that ( e 2 ) < 0 and ( h 2 ) > 0, from the last of these re- 
lationships, we obtain that a fi (x, y) < 0. The condition that the vectors x and y 
lie in one pôle can be expressed as the inequality ( e , x)(e, y) > 0, that is, a fi > 0. 
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Since a/3(x , y) < 0, it follows as in the previous case that this is équivalent to the 
condition (x, y) < 0. □ 

Remark 7.56 As we did in Sect. 3.2 in connection with the partition of a vector 
space L by a hyperplane L', it is possible to ascertain that the partition of the set 
V \ 0 coincides with its partition into two path-connected components V+ and V _ . 
From this we can obtain another proof of Theorem 7.55 without using any formulas. 

A pseudo-Euclidean space emerges in the following remarkable relationship. 

Theorem 7.57 For every pair of timelike vector s x and y , the reverse of the 
Cauchy-Schwarz inequality is satisfied : 

(*,j0 2 >(* 2 )-(j 2 ), (7.82) 

which reduces to an equality if and only if x and y are proportional. 

Proof Let us consider the subspace (x, y), in which are contained ail the vectors of 
interest to us. If the vectors x and y are proportional, that is, y — Xx, where X is 
some scalar, then the inequality (7.82) obviously reduces to a tautological equality. 
Thus we may assume that dim(x, y) = 2, that is, we may suppose ourselves to be in 
the situation considered in Example 7.49. 

As we hâve seen, in the space (x, y), there exists a basis f \, f 2 f° r which the 
relationship (/ \) — (/^) = 0, (/j, f f) — \ holds. Writing the vectors x and y in 
this basis, we obtain the expressions 


x =xif 1 +x 2 f 2 , y = yifi+yif 2 > 

from which it follows that 

(x 2 ) =*1*2, (j 2 ) = y\yi, (X, y) = ^(xiy 2 + x 2 y\). 

Substituting these expressions into (7.82), we see that we hâve to verify the inequal- 
ity (xiy 2 +x 2 yi) 2 > 4xiX2yiy2- Having carried outin the last inequality the obvious 
transformations, we see that this is équivalent to the inequality 


(xiy 2 -x 2 yi) 2 > 0, 


(7.83) 


which holds for ail real values of the variables. Moreover, it is obvious that the 
inequality (7.83) reduces to an equality if and only if xiy 2 — X 2 yi = 0, that is, if and 


only if the déterminant 


X\ X2 

yi yi 


equals 0, and this implies that the vectors x = (x \ , X 2 ) 


and y = (yi, y 2 ) are proportional. 


□ 


From Theorem 7.57 we obtain the following useful corollary. 
Corollary 7.58 Two timelike vectors x and y cannot be orthogonal. 
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P roof Indeed, if (x, y) — 0, then from the inequality (7.82), it follows that (x 2 ) • 
(y 2 ) < 0, and this contradicts the condition (x 2 ) < 0 and (y 2 ) <0. □ 

Similar to the partition of the light cône V into two pôles, we can also partition 
its interior into two parts. Namely, we shall say that timelike vectors e and e' lie 
inside one pôle of the light cône V if the inner products (e, x) and (e',x) hâve the 
same sign for ail vectors x e V and lie inside different pôles if these inner products 
hâve opposite signs. 

A set M C L is said to be convex if for every pair of vectors e, e' g M, the vectors 
g t = te + (1 — t)e' are also in M for ail t g [0, 1]. We shall prove that the interior 
of each pôle of the light cône V is convex, that is, the vector g t lies in the same 
pôle as e and e r for ail t g [0, 1]. To this end, let us note that in the expression 
(g t , x) = t(e, x) + (1 — t)(e f , x), the coefficients t and 1 — t are nonnegative, and 
the inner products (e, x) and (e', x) hâve the same sign. Therefore, for every vector 
x e V, the inner product (g t ,x) has the same sign as (e, x) and (e', x). 

Lemma 7.59 Timelike vectors e and e r lie inside one pôle of the light cône V if and 
only if (e, e') < 0. 

Proof If timelike vectors e and e r lie inside one pôle, then by définition, we hâve 
the inequality ( e , x) • {e' , x) > 0 for ail x g V . Let us assume that ( e , e') > 0. As we 
established above, the vector g t = te -\-( 1 — t)e' is timelike and lies inside the same 
pôle as e and e' for ail t g [0, 1]. 

Let us consider the inner product (g t , e) = t (e, e) + (1 — t)(e, e') as a function 
of the variable t e [0, 1]. It is obvious that this function is continuous and that it 
assumes for t = 0 the value ( e , e r ) > 0, and for t — 1 the value (e, e) < 0. There- 
fore, as is proved in a course in calculus, there exists a value r g [0, 1] such that 
(g T , e ) = 0. But this contradicts Corollary 7.58. 

Thus we hâve proved that if vectors e and e' lie inside one pôle of the cône V , 
then ( e , e') < 0. The converse assertion is obvious. Let e and e' lie inside different 
pôles, for instance, e is within V+, while e' is within V_. Then we hâve by défini- 
tion that the vector —e' lies inside the pôle V+, and therefore, (e, —e') <0, that is, 
( e , e') >0. □ 


7.8 Lorentz Transformations 

In this section, we shall examine an analogue of orthogonal transformations for 
pseudo-Euclidean spaces called Lorentz transformations. Such transformations hâve 
numerous applications in physics. 9 They are also defined by the condition of pre- 
serving the inner product. 


9 For example, a Lorentz transformation of Minkowski space — a four-dimensional pseudo- 
Euclidean space — plays the same rôle in the spécial theory of relativity (which is where the terni 
Lorentz transformation cornes from) as that played by the Galilean transformations, which describe 
the passage from one inertial reference frame to another in classical Newtonian mechanics. 
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Définition 7.60 A linear transformation K of a pseudo-Euclidean space L is called 
a Lorentz transformation if the relationship 

{U(x),U(y)) = (x,y) (7.84) 

is satisfied for ail vectors x, y e L. 

As in the case of orthogonal transformations, it suffices that the condition (7.84) 
be satisfied for ail vectors x = y of the pseudo-Euclidean space L. The proof of this 
coincides completely with the proof of the analogous assertion in Sect. 7.2. 

Here, as in the case of Euclidean spaces, we shall make use of the inner product 
(x, y) in order to identify L* with L (let us recall that for this, we need only the 
nonsingularity of the bilinear form (x, y) and not the positive definiteness of the 
associated quadratic form (x 2 )). As a resuit, for an arbitrary linear transformation 
A : L —> L, we may consider A* also as a transformation of the space L into itself. 
Repeating the same arguments that we employed in the case of Euclidean spaces, 
we obtain that \A*\ = \A\. In particular, from définition (7.84), it follows that for a 
Lorentz transformation T (, we hâve the relationship 

U* AU = A, (7.85) 

where U is the matrix of the transformation VL in an arbitrary basis e \ , . . . , e n of the 
space L, and A — (aij) is the Gram matrix of the bilinear form (x, y), that is, the 
matrix with éléments aij — (£/, ef). 

The bilinear form (x, y) is nonsingular, that is, \A \ 0, and from the relationship 

(7.85) follows the equality \V.\ 2 = 1, from which we obtain that |*M| = ±1. As in 
the case of a Euclidean space, a transformation with déterminant equal to 1 is called 
proper , while if the déterminant is equal to — 1 , it is improper. 

It follows from the définition that every Lorentz transformation maps the light 
cône V into itself. It follows from Theorem 7.55 that a Lorentz transformation either 
maps each pôle into itself (that is, V.(V+) = V+ and *U(V_) = V_), or else inter- 
changes them (that is, *M(V+) = and *M(V_) = V+). Let us associate with each 
Lorentz transformation K the number v(V \) — +1 in the first case, and v(T() = — 1 
in the second. Like the déterminant |*M|, this number v(VL) is a natural character- 
istic of the associated Lorentz transformation. Let us dénoté the pair of numbers 
(1*1(1, v(*U)) by s (VL). It is obvious that 

e(tr 1 ) = e(tt), e(tti tt 2 ) = s(Ui)e(U 2 ), 

where on the right-hand side, it is understood that the first and second components 
of the pairs are multiplied separately. We shall soon see that in an arbitrary pseudo- 
Euclidean space, there exist Lorentz transformations T( of ail four types, that is, 
with £(T() taking ail values 


(+ 1 > + 1 ), 


(+ 1 ,- 1 ), 


(-1,+D, 


(- 1 ,- 1 ). 
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This property is sometimes interpreted as saying that a pseudo-Euclidean space has 
not two (as in the case of a Euclidean space), but four orientations. 

Like orthogonal transformations of a Euclidean space, Lorentz transformations 
are characterized by the fact that they map an orthonormal basis of a pseudo- 
Euclidean space to an orthonormal basis. Indeed, suppose that for the vectors of 
the orthonormal basis e\, . . . , e n , the equalities 

(ei,ej) = 0 for i^j, (efj = ■ ■ ■ = (^_i) = 1, (e%) = -1 (7.86) 

are satisfied. Then from the condition (7.84), it follows that the images T((ei), . . . , 
V.(e n ) satisfy analogous equalities, that is, they form an orthonormal basis in L. 
Conversely, if for the vectors et, the equality (7.86) is satisfied and analogous equal- 
ities hold for the vectors T((e;), then as is easily verified, for arbitrary vectors x and 
y of the pseudo-Euclidean space L, the relationship (7.84) is satisfied. 

Two orthonormal bases are said to hâve the saine orientation if for a Lorentz 
transformation taking one basis to the other, e{W) — (+1, +1). The choice of 
a class of bases with the same orientation is called an orientation of the pseudo- 
Euclidean space L. Taking for now on faith the fact (which will be proved a lit- 
tle later) that there exist Lorentz transformations VL with ail theoretically possible 
s (VL), we see that in a pseudo-Euclidean space, it is possible to introduce exactly 
four orientations. 

Example 7.61 Let us consider some concepts about pseudo-Euclidean spaces that 
we encountered in Example 7.49, that is, for dim L = 2 and s = 1. As we hâve seen, 
in this space, there exists a basis /j , f 2 for which the relationships (/ 2 ) — (f 2 ) = 
0 , (/i , f 2 ) — are satisfied, and the scalar square of the vector x = xf± + y f 2 is 
equal to (x 2 ) = xy. If VL : L — > L is a Lorentz transformation given by the formula 

x' = ax + by, y' = ex + dy , 

then the equality (T((x), T((x)) = (x, x) for the vector x = xf j + y f 2 takes the 
form x'y' = xy , that is, (ax + by)(cx + dy) = xy for ail x and y. From this, we 
obtain 


ac = 0 , bd — 0 , ad + bc=\. 

In view of the equality ad + bc — 1, the values a — b — 0 are impossible. 

If a 7 ^ 0, then c — 0, and this implies that ad — 1 , that is, d — a~ 1 7 ^ 0 and b — 0. 
Thus the transformation VL has the form 

x' — ax, y r — a~ { y. (7.87) 

This is a proper transformation. 

On the other hand, if b ^ 0, then d — 0, and this implies that c — b~ [ , a = 0. The 
transformation VL has in this case the form 


x' = by , 


y —b 1 X. 


(7.88) 


278 


7 Euclidean Spaces 


This is an improper transformation. 

If we write the transformation VL in the form (7.87) or (7.88), depending on 
whether it is proper or improper, then the sign of the number a or respectively b 
indicates whether VL interchanges the pôles of the light cône or préserves each of 
them. Namely, let us prove that the transformation (7.87) causes the pôles to change 
places if a < 0, and préserves them if a > 0. And analogously, the transformation 
(7.88) interchanges the pôles if b < 0 and préserves them if b > 0. 

By Theorem 7.55, the partition of the light cône V into two pôles V+ and V_ 
does not dépend on the choice of timelike vector, and therefore, by Lemma 7.59, we 
need only détermine the sign of the inner product ( e , VL(e)) for an arbitrary timelike 
vector e. Let e — xf { + y f 2 . Then (e 2 ) — xy < 0. In the case that VL is a proper 
transformation, we hâve formula (7.87), from which it follows that 

V.(e) = axfi+ cT x y f 2 , ( e , V.(e)) = [a + a~ [ )xy. 

Since xy < 0, the inner product ( e , VL(e)) is négative if a + a~ { >0, and positive if 
a + a~ [ <0. But it is obvious that a + a~ [ > Ofora >0, and a + a~ [ < 0 for a < 0. 
Thus for a > 0, we hâve ( e , C ll{e)) < 0, and by Lemma 7.59, the vectors e and VL(e) 
lie inside one pôle. Consequently, the transformation VL préserves the pôles of the 
light cône. Analogously, for a < 0, we obtain ( e , V L(e)) > 0, that is, e and V L(e) lie 
inside different pôles, and therefore, the transformation VL interchanges the pôles. 

The case of an improper transformation can be examined with the help of for- 
mula (7.88). Reasoning analogously to what has gone before, we obtain from it the 
relationships 

U(e) = b- 1 yf l + bxf 2 , (e,U(e)) = bx 2 + b- 1 y 2 , 

from which it is clear that now the sign of (e, V L(e)) coincides with the sign of the 
number b. 


Example 7.62 It is sometimes convenient to use the fact that a Lorentz transfor- 
mation of a pseudo-Euclidean plane can be written in an alternative form, using 
the hyperbolic sine and cosine. We saw earlier (formulas (7.87) and (7.88)) that in 
the basis / l5 f 2 defined by the relationship (7.78), proper and improper Lorentz 
transformations are given respectively by the equalities 

U{f x ) = af x , U(f 2 ) = a~ 1 f 2 , 

U(fi) = bf 2 , U(f 2 ) = b~ l f l . 


From this, it is not difficult to dérivé that in the orthonormal basis e\,e 2 , related 
1° f[if 2 by formula (7.78), these transformations are given respectively by the 
equalities 


-l 


,-i 


a + a * a — a 
U(e x ) = e\ H e 2 . 


a — a 1 a + a 1 


■e\ + 


•* 2 . 


(7.89) 


U{e 2 ) = 


2 


2 
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U{ef) = 


U(e 2 ) = 


b + b 1 
2 

b-b~ [ 

2 




b — b 


-l 


2 

b + Z? -1 


* 2 . 


•* 2 - 


(7.90) 


Setting here a = and Z? = where the sign =b coincides with the sign of the 
number « or Z? in formula (7.89) or (7.90) respectively, we obtain that the matrices 
of the proper transformations hâve the form 


( cosh xf sinh x/s \ 
sinh i p cosh \ Js J 


or 


( — cosh t j/ — sinh x/r \ 

— sinh xj/ — cosh x/s J ’ 


(7.91) 


while the matrices of the improper transformations hâve the form 

( cosh t j/ sinh xjr \ f — cosh i jr — sinh x[r 

— sinh xj/ — cosh xjr J y sinh i {/ cosh x/r 

where sinh x// = ( e ^ — e~^)/2 and coshi p = (e^ + e~^)/2 are the hyperbolic sine 
and cosine. 



Theorem 7.63 In every pseudo-Euclidean space there exist Lorentz transforma- 
tions VL with ail four possible values of sifll). 

Proof For the case dim L = 2, we hâve already proved the theorem: In Exam- 
ple 7.62, we saw that there exist four distinct types of Lorentz transformation of a 
pseudo-Euclidean space having in a suitable orthonormal basis the matrices (7.91), 
(7.92). It is obvious that with these matrices, the transformation V. gives ail possible 
values £(VL). 

Let us now move on to the general case dimL > 2. Let us choose in the pseudo- 
Euclidean space L an arbitrary timelike vector e and any e' not proportional to it. 
By Lemma 7.53, the two-dimensional space (e, e') is a pseudo-Euclidean space 
(therefore nondegenerate), and we hâve the décomposition 

L — (e,e')®(e,e')' L . 

From the law of inertia, it follows that the space (e, e')^ is a Euclidean space. In Ex- 
ample 7.62, we saw that in the pseudo-Euclidean plane (e, e ') , there exists a Lorentz 
transformation VL i with arbitrary value s (VL i). Let us define the transformation 
VL : L —> L as in (e, e') and 8 in {e, e') A ~, that is, for a vector x = y -h z, where 
y e ( e , e') and z g (e, we shall set V.(x) — Ui (y) -h z . Then is clearly a 
Lorentz transformation, and s(V.) = s(V.i). □ 

There is an analogue to Theorem 7.24 for Lorentz transformations. 

Theorem 7.64 If a space L' is invariant with respect to a Lorentz transformation 
T (, then its orthogonal complément (L/) -1 " is also invariant with respect to VL. 
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P roof The proof of this theorem is an exact répétition of the proof of Theorem 7.24, 
since there, we did not use the positive definiteness of the quadratic form ( x 2 ) as- 
sociated with the bilinear form (x, y), but only its nonsingularity. See Remark 7.25 
on p. 227. □ 

The study of a Lorentz transformation of a pseudo-Euclidean space is reduced to 
the analogous question for orthogonal transformations of a Euclidean space, based 
on the following resuit. 

Theorem 7.65 For every Lorentz transformation VL of a pseudo-Eucliclean space 
L, there exist nondegenerate sub spaces Lo and Lj invariant with respect to VL such 
that L has the orthogonal décomposition 

L = Lo ® U , Lo-LL!, (7.93) 

where the subspace Lo is a Euclidean space , and the dimension of Li is equal to 1, 
2, or 3. 

It follows from the law of inertia that if dimLi = 1, then Li is spanned by a 
timelike vector. If dimLi = 2 or 3, then the pseudo-Euclidean space Li can be rep- 
resented in turn by a direct sum of subspaces of lower dimension invariant with 
respect to VL. However, such a décomposition is no longer necessarily orthogonal 
(see Example 7.48). 

Proof of Theorem 7.65 The proof is by induction on n, the dimension of the space L. 
For n = 2, the assertion of the theorem is obvious — in the décomposition (7.93) one 
has only to set Lo = (0) and Li = L. 10 

Now let n > 2, and suppose that the assertion of the theorem has been proved for 
ail pseudo-Euclidean spaces of dimension less than n. We shall use results obtained 
in Chaps. 4 and 5 on linear transformations of a vector space into itself. Obviously, 
one of the following three cases must hold: the transformation VL has a complex 
eigenvalue, VL has two linearly independent eigenvectors, or the space L is cyclic 
for VL, corresponding to the only real eigenvalue. Let us consider the three cases 
separately. 

Case 7. A linear transformation VL of a real vector space L has a complex eigen- 
value X. As established in Sect. 4.3, then VL also has the complex conjugate eigen- 
value À, and moreover, to the pair X, X there corresponds the two-dimensional real 
invariant subspace L' C L, which contains no real eigenvectors. It is obvious that L' 
cannot be a pseudo-Euclidean space: for then the restriction of VL to L' would hâve 
real eigenvalues, and L' would contain real eigenvectors of the transformation VL\ 
see Examples 7.61 and 7.62. Let us show that L' is nondegenerate. 


10 The nondegeneracy of the subspace Lo = (0) relative to a bilinear form follows from the défi- 
nitions given on pages 266 and 195. Indeed, the rank of the restriction of the bilinear form to the 
subspace (0) is zéro, and therefore, it coincides with dim(0). 
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Suppose that L is degenerate. Then it contains a lightlike vector e ^ 0. Since Xi 
is a Lorentz transformation, the vector Xi(e) is also lightlike, and since the subspace 
L is invariant with respect to Vl, it follows that Xi(e) is contained in L. Therefore, 
the subspace L contains two lightlike vectors: e and ii(e). By Lemma 7.53, these 
vectors cannot be linearly independent, since then L would be a pseudo-Euclidean 
space, but that would contradict our assumption that L is degenerate. From this, it 
follows that the vector Xi(e) is proportional to e , and that implies that e is an eigen- 
vector of the transformation XI, which, as we hâve observed above, cannot be. This 
contradiction means that the subspace L is nondegenerate, and as a conséquence, it 
is a Euclidean space. 


Case 2. The linear transformation Xi has two linearly independent eigenvectors: e\ 
and ^2- If at least one of them is not lightlike, that is, (ef) ^ 0, then L' = (e,) is 
a nondegenerate invariant subspace of dimension 1 . And if both eigenvectors e \ 
and e2 are lightlike, then by Lemma 7.53, the subspace L' = (e\, ej) is an invariant 
pseudo-Euclidean plane. 

Thus in both cases, the transformation Xi has a nondegenerate invariant subspace 
L of dimension 1 or 2. This means that in both cases, we hâve an orthogonal dé- 
composition (7.73), that is, L = L ® (L) -1 . If L' is one-dimensional and spanned by 
a timelike vector or is a pseudo-Euclidean plane, then this is exactly décomposition 
(7.93) with Lo = (L') -1 and l_i = L/. In the opposite case, the subspace L is a Eu- 
clidean space of dimension 1 or 2, and the subspace (L)" 1 is a pseudo-Euclidean 
space of dimension n — 1 or n — 2 respectively. B y the induction hypothesis, for 
(L) -1 , we hâve the orthogonal décomposition (L / ) _l = Lq ® Lj analogous to (7.93). 
From this, for L we obtain the décomposition (7.93) with l_o = L/ ® Lq and l_i = Lj . 


Case 3. The space L is cyclic for the transformation ii, corresponding to the unique 
real eigenvalue À and principal vector e of grade m = n. Obviously, for n = 2, this 
is impossible: as we saw in Example 7.61, in a suitable basis of a pseudo-Euclidean 
plane, a Lorentz transformation has either diagonal form (7.87) or the form (7.88) 
with distinct eigenvalues ±1. In both cases, it is obvious that the pseudo-Euclidean 
plane L cannot be a cyclic subspace of the transformation Xi. 

Let us consider the case of a pseudo-Euclidean space L of dimension n > 3. We 
shah prove that L can be a cyclic subspace of the transformation Xi only if n — 3. 

As we established in Sect. 5.1, in a cyclic subspace L, there is a basis e \ , . . . , e n 
dehned by formula (5.5), that is, 

e\ = e, e 2 = (U-\.8)(e), e n = (U - X8) n ~\e), (7.94) 

in which relationships (5.6) hold: 


Xi{e\) = ke\ +é? 2 , 


Xi(e 2 ) = À £2 + £ 3 , 


• • 1 


U(e n ) = Xe n . (7.95) 
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In this basis, the matrix of the transformation VL has the form of a Jordan block 


(X 0 
1 X 
0 1 



0 

0 

X 


0\ 

0 

0 


: X 0 

\0 0 0 ••• 1 x ) 


(7.96) 


It is easy to see that the eigenvector e n is lightlike. Indeed, if we had (e 2 ) 7^ 0, 
then we would hâve the orthogonal décomposition L = (e n ) 0 ( e n) 2 -, where both 
subspaces (e n ) and ( e n ) _L are invariant. But this contradicts the assumption that the 
space L is cyclic. 

Since VL is a Lorentz transformation, it préserves the inner product of vectors, 
and from (7.95), we obtain the equality 


&n) — (#/), *^(^h )) — (7*^/ H" ^/+1 5 

— X (ci , Cji) H - À(^/_|_i , c ^ 


(7.97) 


for ail i = 1 , . . . , n — 1 . 

If X 2 7^ 1, then from (7.97), it follows that 


(eï,é?„) 


À 

| (*7+ 1 ’ ^n)* 


Substituting into this equality the values of the index i — n — 1, . . . , 1, taking into 
account that (e 2 ) = 0, we therefore obtain step by step that (e,, e n ) — 0 for ail i. 
This means that the eigenvector e n is contained in the radical of the space L, and 
since L is a pseudo-Euclidean space (that is, in particular, nondegenerate), it follows 
that e„ — 0. This contradiction shows that X 2 = 1. 

Substituting X 2 — 1 into the equalities (7.97) and collecting like terms, we find 
that (0;+ 1 , e n ) — 0 for ail indices i = 1 , . ..,/? — 1, that is, (e j, e n ) = 0 for ail indices 
j = 2, . . . , n. In particular, we hâve the equalities (e n -\, e n ) — 0 for n > 2 and 
(e n - 2 , e n) — 0 for n >3. From this it follows that n — 3. Indeed, from the condition 
of préservation of the inner product, we hâve the relationship 

(*?„_ 2, e n -\) = (U(e„- 2), U(e n - 1)) = (Xe n -2 + e n -i,Xe n -i + e n ) 

= k 2 (e n - 2 ,e n -i)+X(e n - 2 ,e n ) + X(eî_ l ) + (e n -\,e n ), 

from which, taking into account the relationships À 2 = 1 and e n ) — 0, we 

hâve the equality (, e n - 2 , e n ) + = 0. If n > 3, then (e n - 2 , e n ) = 0, and from 

this, we obtain that {c 2 _ x ) — 0, that is, the vector e n -\ is lightlike. 

Let us examine the subspace L' = (e n , e n -\). It is obvious that it is invariant 
with respect to the transformation K, and since it contains two linearly independent 
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lightlike vectors e n and e n -\, then by Lemma 7.53, the subspace L' is a pseudo- 
Euclidean space, and we obtain the décomposition L = L' ® (L') 2 - as a direct sum 
of two invariant subspaces. But this contradicts the fact that the space L is cyclic. 
Therefore, the transformation T( can hâve cyclic subspaces only of dimension 3. 

Putting together cases 1, 2, and 3, and taking into account the induction hypoth- 
esis, we obtain the assertion of the theorem. □ 

Combining Theorems 7.27 and 7.65, we obtain the following corollary. 

Corollary 7.66 For every transformation of a pseudo-Euclidean space , there exists 
an orthonormal basis in which the matrix ofthe transformation has block- diagonal 
form with blocks of the following types : 

1. blocks oforder 1 with éléments ±1; 

2. blocks oforder 2 of type (7.29); 

3. blocks oforder 2 of type (7.91)— (7.92); 

4. blocks of order 3 corresponding to a three-dimensional cyclic subspace with 
eigenvalue d= 1 . 

It follows from the law of inertia that the matrix of a Lorentz transformation can 
contain not more than one block of type 3 or 4. 

Let us note as well that a block of type 4 corresponding to a three-dimensional 
cyclic subspace cannot be brought into Jordan normal form in an orthonormal basis. 
Indeed, as we saw earlier, a block of type 4 is brought into Jordan normal form in the 
basis (7.94), where the eigenvector e n is lightlike, and therefore, it cannot belong to 
any orthonormal basis. 

With the proof of Theorem 7.65 we hâve established necessary conditions for a 
Lorentz transformation to hâve a cyclic subspace — in particular, its dimension must 
be 3, corresponding to an eigenvalue equal to d=l, and eigenvector that is lightlike. 
Clearly, these necessary conditions are not sufficient, since in deriving them, we 
used the equalities (e; , eif) = (VL(ej), T L(ek)) for only some of the vectors of the 
basis (7.94). Let us show that Lorentz transformations with cyclic subspaces indeed 
exist. 


Example 7.67 Let us consider a vector space L of dimension n — 3. Let us choose 
in L a basis e\, e^, e?> and define a transformation VL : L ^ L using relationships 
(7.95) with the number X = ±1. Then the matrix of the transformation VL will take 
the form of a Jordan block with eigenvalue X. 

Let us choose the Gram matrix for a basis e \ , ei.e-x, such that L is given the struc- 
ture of a pseudo-Euclidean space. With the proof of Theorem 7.65, we hâve found 
necessary conditions ( 02 , 03 ) = 0 and (e 2 ) — 0. Let us set (e 2 ) — a , (e\, ef) — b , 
(e \ , e 3 ) = c, and (e 2 ) — d. Then the Gram matrix can be written as 


A = 


abc 
b d 0 
c 0 0 


(7.98) 
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On the other hand, as we know (see Example 7.51, p. 270), in L there exists an 
orthonormal basis in which the Gram matrix is diagonal and has déterminant — 1 . 
Since the sign of the déterminant of the Gram matrix is one and the same for ail 
bases, it follows that |A| = —c 2 d < 0 , that is, c ^ 0 and d > 0 . 

The conditions c^O and d > 0 are also sufficient for the vector space in which 
the inner product is given by the Gram matrix A in the form (7.98) to be a pseudo- 
Euclidean space. Indeed, choosing a basis £i, g 2 ^ S 3 i n which the quadratic form 
associated with the matrix A has canonical form (6.28), we see that the condition 
| A | < 0 is satisfied by, besides a pseudo-Euclidean space, only a space in which 

O 

( gj ) = — 1 for ail / = 1,2,3. But such a quadratic form is négative definite, that is, 
(x 2 ) < 0 for ail vectors x^O, and this contradicts that (e 2 ) = d > 0 . 

Let us now consider the equalities (et , e&) — ( 11(ej ), T i(ek)) for ail indices i < k 
from 1 to 3. Taking into account X 2 — 1, (e 2 , £ 3 ) = 0, and ( e 2 ) — 0, we see that they 
are satisfied automatically except for the cases i = k = 1 and i = 1, k = 2. These 
two cases give the relationships 2 Xb + d — 0 and c + d — 0. Thus we may choose 
the number a arbitrarily, the number d to be any positive number, and set c — — d 
and b — —Xd/2. It is also not difficult to ascertain that linearly independent vectors 
e\ ,e 2 ,e 3 satisfying such conditions in fact exist. 

Just as in a Euclidean space, the presence of different orientations of a pseudo- 
Euclidean space determined by the value of £(11) for the Lorentz transformation 
11 is connected with the concept of continuous deformation of a transformation 
(p. 230), which defines an équivalence relation on the set of transformations. 

Let 11 t be a family of Lorentz transformations continuously depending on the pa- 
rameter t. Then \11 t \ also dépends continuously on t, and since the déterminant of 
a Lorentz transformation is equal to ±1, the value of \11 t \ is constant for ail t. Thus 
Lorentz transformations with déterminants having opposite signs cannot be contin- 
uously deformed into each other. But in contrast to orthogonal transformations of a 
Euclidean space, Lorentz transformations 11 t hâve an additional characteristic, the 
number v(11 t ) (see the définition on p. 276). Let us show that like the déterminant 
\11 t \, the number v(11 t ) is also constant. 

To this end, let us choose an arbitrary timelike vector e and make use of 
Lemma 7.59. The vector 11 t (e) is also timelike, and moreover, v(11 t ) = +1 if e and 
11t(e) lie inside one pôle of the light cône, that is, (e, 11 t (e)) < 0 , and v(11 t ) — — 1 
if e and T L t (e) lie inside different pôles, that is, (e, 11 t (e)) > 0. It then remains to 
observe that the function ( e , 11 t (e)) dépends continuously on the argument t, and 
therefore can change sign only if for some value of t, it assumes the value zéro. But 
from inequality (7.82) for timelike vectors x — e and y — 11 t (e), there follows the 
inequality 

(e, U t (e)f>{e~)-{U,{e) 2 )> 0 , 

showing that (e, 11 t (e)) cannot be zéro for any value of t. 

Thus taking into account Theorem 7.63, we see that the number of équivalence 
classes of Lorentz transformations is certainly not less than four. Now we shall 


7.8 Lorentz Transformations 


285 


show that there are exactly four. To begin with, we shall establish this for a pseudo- 
Euclidean plane, and thereafter shall prove it for a pseudo-Euclidean space of arbi- 
trary dimension. 

Example 7.68 The matrices (7.91), (7.92) presenting ail possible Lorentz transfor- 
mations of a pseudo-Euclidean plane can be continuously deformed into the matri- 
ces 



(7.99) 


respectively. Indeed, we obtain the necessary continuous deformation if in the ma- 
trices (7.91), (7.92) we replace the parameter i/s by (1 — t) \j/, where t e [0, 1]. It is 
also clear that none of the four matrices (7.99) can be continuously deformed into 
any of the others: any two of them differ either by the signs of their déterminants 
or in that one of them préserves the pôles of the light cône, while the other causes 
them to exchange places. 


In the general case, we hâve an analogue of Theorem 7.28. 

Theorem 7.69 Two Lorentz transformations VL \ and VL 2 of a real pseudo- 
Euclidean space are continuously déformable into each other if and only if s (VL i) = 
s(U 2 ). 


Proof Just as in the case of Theorem 7.28, we begin with a more spécifie assertion: 
we shall show that an arbitrary Lorentz transformation VL for which 

e(U) = (|K|, v(U)) = (+1, +1) (7.100) 

holds can be continuously deformed into 8. Invoking Theorem 7.65, let us examine 
the orthogonal décomposition (7.93), denoting by VLi the restriction of the transfor- 
mation VL to the invariant subspace L/ , where i = 0,1. We shall investigate three 
cases in turn. 


Case 1. In the décomposition (7.93), the dimension of the subspace Li is equal to 
1, that is, Li = (e), where (e 2 ) < 0. Then to the subspace Li, there corresponds 
in the matrix of the transformation VL a block of order 1 with o — +1 or — 1, 
and VLq is an orthogonal transformation that depending on the sign of cr, can be 
proper or improper, so that the condition \VL\ = cr\VLo\ = 1 is satisfied. However, 
it is easy to see that for o = — 1, we hâve v(VL) = — 1 (since ( e , V L(e)) > 0), and 
therefore, the condition (7.100) leaves only the case o — +1, and consequently, the 
orthogonal transformation VLo is proper. Then VL \ is the identity transformation (of 
a one-dimensional space). By Theorem 7.28, an orthogonal transformation VLq is 
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continuously déformable into the identity, and therefore, the transformation VL is 
continuously déformable into 8 . 

Case 2. In the décomposition (7.93), the dimension of the subspace Li is equal to 
2, that is, Li is a pseudo-Euclidean plane. Then as we established in Examples 7.62 
and 7.68, in some orthonormal basis of the plane l_i , the matrix of the transformation 
Vi[ has the form (7.92) and is continuously déformable into one of the four matrices 
(7.99). It is obvious that the condition v(%l ) = 1 is associated with only the matrix 
E and one of the matrices F 2 , F 3 , namely the one in which the eigenvalues ±1 
correspond to the eigenvectors in such a way that (g+) < 0 and (g 2 _) > 0. In 
this case, it is obvious that we hâve the orthogonal décomposition Li = (g + ) ® (g_). 

If the matrix of the transformation Vi[ is continuously déformable into F, then 
the orthogonal transformation Vio is proper, and it follows that it is also continuously 
déformable into the identity, which proves our assertion. 

If the matrix of the transformation VL \ is continuously déformable into F 2 or 
F 3 , then the orthogonal transformation Vio is improper, and consequently, its matrix 
is continuously déformable into the matrix (7.32), which has the eigenvalue — 1 
corresponding to some eigenvector h e l_o. From the orthogonal décomposition L = 
Lo ® (g + ) ® (g _) 9 taking into account (g+) < 0 , it follows that the invariant plane 
L' = (g_, h) is a Euclidean space. The matrix of the restriction of U to L' is equal 
to — F, and is therefore continuously déformable into F. And this implies that the 
transformation VL is continuously déformable into 8 . 

Case 3. In the décomposition (7.93), the subspace Li is a cyclic three-dimensional 
pseudo-Euclidean space with eigenvalue À = d= 1 . This case was examined in detail 
in Example 7.67, and we will use the notation introduced there. It is obvious that the 
condition v(Vi) = 1 is satisfied only for À = 1, since otherwise, the transformation 
Vi[ takes the lightlike eigenvector e 3 to —£ 3 , that is, it transposes the pôles of the 
light cône. Thus condition (7.100) corresponds to the Lorentz transformation \l\ 
with the value £(TG) = (+ 1 ,+ 1 ) and proper orthogonal transformation Vio . 

Let us show that such a transformation VL 1 is continuously déformable into the 
identity. Since Vio is obviously also continuously déformable into the identity, this 
will give us the required assertion. 

Thus let À = 1. We shall fix in Li a basis e\, £ 2 , £3 satisfying the following con- 
ditions introduced in Example 7.67: 


with some numbers a and d > 0. The Gram matrix A in this basis has the form 
(7.98) with c = —d and b = —d/2, while the matrix U\ of the transformation \l\ 
has the form of a Jordan block. 


(£i> £3) = 



d 

(e\,e 2 ) = 

(e\ )=d, {e 2 ,ei) = (eQ = 0 


(7.101) 


7.8 Lorentz Transformations 


287 


Let VL, be a linear transformation of the space l_i whose matrix in the basis 
e\ , 02 » ^3 has the form 


/ 1 ° °\ 

U,= \ t 1 0 , (7.102) 

\<p(t) t \) 


where t is a real parameter taking values from 0 to 1, and <p(t) is a continuous func- 
tion of t that we shall choose in such a way that VL t is a Lorentz transformation. As 
we know, for this, the relationship (7.85) with matrix U — U t must be satisfied. Sub- 
stituting in the equality U* AU, = A the matrix A of the form (7.98) with c = —d 
and b = —d / 2 and matrix U t of the form (7.102) and equating corresponding él- 
éments 011 the left- and right-hand sides, we obtain that the equality U* AU, = A 
holds if cp(t) — î(t — l)/2. For such a choice of function (p(t), we obtain a family 
of Lorentz transformations VL, depending continuously on the parameter t e [0, 1]. 
Moreover, it is obvious that for t = 1, the matrix U, has the Jordan block U 1 , while 
for t — 0, the matrix U, equals E. Thus the family VL, effects a continuous defor- 
mation of the transformation VL\ into 8. 

Now let us prove the assertion of Theorem 7.69 in general form. Let TV be a 
Lorentz transformation with arbitrary £(TV). We shall show that it can be continu- 
ously deformed into the transformation E, having in some orthonormal basis the 
block-diagonal matrix 




■> 


where E is the identity matrix of order n — 2 and F' is one of the four matrices 
(7.99). It is obvious that by choosing a suitable matrix F we may obtain the Lorentz 
transformation E with any desired s(E). Let us select the matrix F' in such a way 
that s(E) = £(TV). 

Let us select in our space an arbitrary orthonormal basis, and in that basis, let 
the transformation TV hâve matrix W. Then the transformation VL having in this 
same basis the matrix U — W F is a Lorentz transformation, and moreover, by our 
choice of £(J r ) = £(TV), we hâve the equality £(V.) — £(TV)£(^T) = (+1, +1). Fur- 
ther, from the trivially verified relationship F~ l — F, we obtain W — U F, that is, 
TV = VUF . We shall now make use of a family VL, that effects the continuous defor- 
mation of the transformation V, into 8. From the equality TV = VUF, with the help 
of Lemma 4.37, we obtain the relationship TV^ = Vit 3^ , in which TVo = 8!F — !F 
and TVj = VUE = TV. Thus it is this family r W t = e U t 3 r that accomplishes the defor- 
mation of the Lorentz transformation TV into E . 

If Vii and VL 2 are Lorentz transformations such that s (VL 1 ) = s(Vi 2 ), then by 
what we showed earlier, each of them is continuously déformable into E with one 
and the same matrix F\ Consequently, by transitivity, the transformations Vii and 
VL 2 are continuously déformable into each other. □ 


Similarly to what we did in Sects. 4.4 and 7.3 for nonsingular and orthogonal 
transformations, we can express the fact established by Theorem 7.69 in topological 
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form: the set of Lorentz transformations of a pseudo-Euclidean space of a given 
dimension has exactly four path-connected components. They correspond to the four 
possible values of s{W). 

Let us note that the existence of four (instead of two) orientations is not a spécifie 
property of pseudo-Euclidean spaces with the quadratic form (7.76), as was the case 
with the majority of properties of this section. It holds for ail vector spaces with a 
bilinear inner product (x, y), provided that it is nonsingular and the quadratic form 
(x 2 ) is neither positive nor négative definite. We can indicate (without pretending 
to provide a proof) the reason for this phenomenon. If the form (x 2 ), in canonical 
form, appears as 

x 2 H h x 2 — x 2 +1 x 2 , where 5 e {1, . . . , n — 1}, 

then the transformations that preserve it include first of ail, the orthogonal trans- 
formations preserving the form x 2 -b • • • + x 2 and not changing the coordinates 
x 5 +i,...,x M , and secondly, the transformations preserving the quadratic form 
x 2 +1 H b x 2 and not changing the coordinates x \, . . . , x 5 . Every type of transfor- 

mation is “responsible” for its own orientation. 


Chapter 8 

Affine Spaces 


The usual objects of study in plane and solid geometry are the plane and three- 
dimensional space, both of which consist of points. However, vector spaces are 
logically simpler, and therefore, we began by study ing them. Now we can move 
on to “point” (affine) spaces. The theory of such spaces is closely related to that 
of vector spaces, and so in this chapter, we shall be concerned only with questions 
relating specifically to this case. 


8.1 The Définition of an Affine Space 

Let us return to the starting point in the theory of vector spaces, namely to Sect. 3.1. 
There, we said that two points in the plane (or in space) détermine a vector. We shall 
make this property the basis of the axiomatic définition of affine spaces. 


Définition 8.1 An affine space is a pair (V, L) consisting of a set V (whose éléments 
are called points) and a vector space L, on which a rule is defined whereby two points 

A, B e V are associated with a vector of the space L, which we shall dénoté by AB 
(the order of the points A and B is significant). Here the following conditions must 
be satisfied: 

(1) Â5 + Z?C = ÂC. 

(2) For every three points A, B, C e V , there exists a unique point D e V such that 


AB = CD. 


( 8 . 1 ) 


(3) For every two points A, B e V and scalar a , there exists a unique point C e V 
such that 


AC = a AB. 


(8.2) 


Remark 8.2 From condition (2), it follows that we also hâve AC — BD. Indeed, in 
view of condition (1), we hâve the equalities AB + BD — AD and AC + CD — 
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Fig. 8.1 Equality of vectors 



AD. This implies that AB + BD — AC + CD (see Fig. 8.1). Since AB — CD by 
assumption, and ail vectors belong to the space L, it follows that AC — BD. 

From these conditions and the définition of a vector space, it is easy to dérivé 

that for an arbitrary point A e V, the vector AA is equal to 0, and for every pair of 
points A, B e V , we hâve the equality 


B A = —AB. 


It is equally easy to verify that if we are given a point A e V and a vector x — AB 
in the space L, then the point B e V is thereby uniquely determined. 


Theorem 8.3 The totality of ail vectors of the form AB , xvhere A and B are arbi- 
trary points of V ,forms a subspace C ofthe space L. 


P roof Let x — AB , y — CD. By condition (2), there exists a point K such that 
B K — CD. Then by condition (1), the vector 

ÂK = ÂB -\-BK = ÂB -\-CD=x y 

is again contained in the subspace L'. Analogously, for any vector x — AB in L', 
condition (3) gives the vector AC — a AB = ctx, which consequently also is con- 
tained in L'. □ 


In view of Theorem 8.3, we shall require for the study of an affine space (V, L) 
not ail the vectors of the space L, but only those that lie in the subspace L'. Therefore, 
in what follows, we shall dénoté the space L' by L. In other words, we shall assume 
that the following condition is satisfied: for every vector x g L, there exist points A 
and B in V such that x — AB. 

This condition does not impose any additional constraints. It is simply équivalent 
to a change of notation: L instead of L'. 

Example 8.4 Every vector space L defines an affine space (L, L) if for two vectors 

a,b e L considered as points of the set V = L, we set ah — b — a. In particular, the 
totality K n of ail rows of length n defines an affine space. 
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Example 8.5 The plane and space studied in a course in elementary or analytic 
geometry are examples of affine spaces. 

Condition (2) in the définition of an affine space shows that no matter how we 

— > 

choose the point O in the set V , every vector x e L can be represented as x — O A. 
Moreover, from the requirement of the uniqueness of the point D in condition (2), 
it follows that for a designated point O and vector x , the point A is uniquely de- 

termined by the condition OA = x. Thus having chosen (arbitrarily) a point O e V 

— > 

and associating with each point A e V the vector O A, we obtain a bijection between 
the points A of the set V and the vectors x of the space L. In other words, an affine 
space is a vector space in which the coordinate origin O is not fixed. This notion is 
more natural from a physical point of view; in an affine space, ail points are created 
equal, or in other words, the space is uniform. Mathematically, such a notion seems 
more complex: we need to specify not one, but two sets: V and L. And though we 
write an affine space as a pair (V, L), we shall often dénoté such a space simply by 
V, leaving L implied and assuming that the condition formulated above is satisfied. 
In this case, we shall call L the space of vectors of the affine space V . 

Définition 8.6 The dimension of an affine space (V, L) is the dimension of the as- 
sociated vector space L. When we wish to focus our attention on the space V, then 
we shall dénoté the dimension by dim V . 

In the sequel, we shall consider only spaces of finite dimension. We shall call an 
affine space of dimension 1 a line, and an affine space of dimension 2, a plane. 

Having selected the point O e V, we obtain a bijection V —> L. If dimL = n 
and we choose in the space L some basis e\, ...,e n , then we hâve the isomorphism 
L K n . Thus for an arbitrary choice of a point O e V and basis in L, we obtain a bi- 
jection V -> W 1 and define each point of the affine space V by the set of coordinates 

(oq , . . . , a n ) of the vector x = OA in the basis e \ , . . . , e n . 

Définition 8.7 The point O and basis e \ , . . . , e n together are called a / rame ofref- 
erence in the space V, and we write (O; e\, ...,e n ). The n-tuple (ai, . . . , a n ) asso- 
ciated with the point A e V is called the coordinates of the point A of the associated 
frame of reference. 

If relative to the frame of reference ((9; e \ , . . . , e n ), the point A has coordinates 

(a \ , . . . , a n ), while the point B has coordinates , . . . , /3 n ), then the vector AB 

has, with respect to the basis e \ , . . . , e n , coordinates — oq , . . . , f n — a n ). 

Just as with the sélection of a basis in a vector space, every vector of that space is 

determined by its coordinates, likewise is every point of an affine space determined 

by its coordinates in a given frame of reference. Thus a frame of reference plays the 

same rôle in the theory of affine spaces as that played by a basis in the theory of 

vector spaces. We hâve defined frame of reference as a collection consisting of the 

point O and n vectors e \ , . . . , e n that form a basis of L. Any of these vectors ei can 

> 

be written in the form e\ — O Ai, and then it is possible to give the frame of reference 
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as a collection of n + 1 points O , A \ , . . . , A n . Here the points O , A \ , . . . , A n are not 

> > 

arbitrary; they must satisfy the property that the vectors O A \ , ... , O A n form a basis 
of L, that is, they must be linearly independent. 

We hâve seen that the choice of a point O in V détermines an isomorphism be- 

tween V and L that assigns to each point A e V the vector O A e L. Let us consider 
how this correspondence changes when we change the point O . If we began with the 
point 0\ then we will hâve placed in correspondence with the point A, the vector 

O A, which, by définition of an affine space, is equal to 0 0 + OA. Thus if in the 

first case, we assign to the point A the vector x, then in the second, we assign the 

> 

vector x + «, where a — O' O. We obtain a corresponding mapping of the set V if 

to the point A, we assign the point B such that AB — a. Such a point B is uniquely 
determined by the choice of A and a. 

Définition 8.8 A translation of an affine space ( V, L) by a vector a e L is a mapping 

of the set V into itself that assigns to the point A the point B such that AB — a. (The 
existence and uniqueness of such a point B g V for every A g V and «g L follows 
from the définition of affine space.) 

We shall dénoté the translation by the vector a by T a . Thus the définition of a 
translation can be written as the formula 

T a (A) — B. where AJ3 = a . 

From the given définition, a translation is an isomorphism of the set V into itself. It 
can be depicted with the help of the diagram 



where the bijection xJ/ between V and L is defined using the point O, while the 

> 

bijection x/r' uses the point 0\ and T a is a translation by the vector a = O' O. As a 
resuit, the mapping x[r is the product (sequential application, or composition) of the 
mappings T a and xj /' . This relationship can be more briefly written as xj/' = xjr + a. 

Proposition 8.9 Translations possess the following properties : 
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P roof In property (1), the left-hand side consists of the product of mappings, which 
means that for every point C e V , the equality 

Ta(T h (C)) = T a+b (C) (8.4) 

is satisfied. Let us represent the vector b in the form b — CP (not only is this pos- 
sible, but by the définition of affine space, the point P e V is uniquely determined). 
Then we hâve the equality Tb(C) = P. Likewise, let us represent the vector a in the 
form a — PQ. Then analogously, T a (P) — Q. It follows from these relationships 
that 

a + b = CP + ~PQ = C0, 

from which we obviously obtain T a +b(C) = Q . On the other hand, we hâve the 
equality T a (Tb(C)) — T a (P) = Q , which proves the relationship (8.4). 

Properties (2) and (3) can be proved even more easily. □ 

Let us note that for any two points A, B e V, there exists a unique vector a e L 

for which T a (A) = B , namely, the vector a — AB. 

Suppose that we are given a certain frame of reference (O; e \, . . . , e n ). Relative 
to this frame of reference, every point A e V has coordinates (xi, . . . , x n ). A func- 
tion F (A) defined on the affine space V and taking numeric values is called a poly- 
nomial if it can be written as a polynomial in the coordinates x \ , . . . , x n . 

This définition can be given a different formulation. Let us dénoté by : V -> L 
the bijection between V and L determined by the sélection of an arbitrary point O. 
Then the function F on V is a polynomial if it can be represented in the form 
F (A) = G(\j/(A)), where G(x) is a polynomial on the space L (see the définition 
on p. 127). To be sure, it is still necessary to verify that this définition does not dé- 
pend on the choice of frame of reference (O; e \, ... ,e n ), but this can be done very 
easily. If xfr' : V -> L is a bijection between V and L determined by the choice of 
point O' (cf. diagram (8.3)), then \j/' = -h a. As we saw in Sect. 3.8, the property 
of a function G(x) being a polynomial does not dépend on the choice of basis in L, 
and it remains to verify that for a polynomial G(x) and vector a g L, the function 
G(x + a) is also a polynomial. It is clearly sufficient to verify this for the monomial 
cx k { 1 • • • Xn n • If the vector x has coordinates x \, . . . , x n , and the vector a has coor- 
dinates a \ , . . . , a n , then substituting them into the monomial exf • • • x n n , we obtain 
the expression c(x\ + a\) kl • • • (x n + a n ) k " , which is clearly also a polynomial in the 
variables x \ , . . . , x n . 

Using the same considérations as those employed in Example 3.86 on p. 130, we 
may define for an arbitrary polynomial F on an affine space V its differential do F 
at an arbitrary point O e V. Here the differential do F will be a linear function 
on the space of vectors L of the space V\ that is, it will be a vector in the dual 
space L*. Indeed, let us consider the bijection \jr : V — > L constructed earlier, for 
which ir{0) = 0; let us represent F in the form F(A) = G(\/s(A)), where G(x) is 
some polynomial on the vector space L; and let us define do F = doG as a linear 
function on L. 
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Suppose that we are given the frame of reference (O; e \ , . . . , e n ) in the space V . 
Then F (A) is a polynomial in the coordinates of the point A with respect to this 
frame of reference. Let us write down the expression do F in these coordinates. By 
définition, the differential 


n 


9G 


dp F = tZ 0 G = y^— (0 )xi 

i = 1 Xi 


is a linear function in the coordinates x\, ... ,x n with respect to the basis e\, ... ,e n . 
Here dG/dxi is a polynomial, and it corresponds to some polynomial 0/ on V, 
that is, it has the form 0;(A) = |^(i/r(A)). By définition, we set 0, = dF/dxf. It is 
easy to verify that if we express F and 0/ as polynomials in x \ , . . . , x n , then 0/ will 
indeed be the partial dérivative of F with respect to the variable X [ . Since \j/{ O ) = 0, 
it follows that |^-(0) = Consequently, we obtain for the differential do F, 

the expression 


3 F 

d 0 F = 


i= 1 


3 Xi 


which is similar to formula (3.70) obtained in Sect. 3.8. 


8.2 Affine Spaces 

Définition 8.10 A subset V' C V of an affine space (V, L) is an affine s ub space if 

the set of vectors AB for ail A, B e V' forms a vector subspace L 7 of the vector 
space L. 

It is obvious that then V 7 itself is an affine subspace, and L 7 is its space of vectors. 
If dim V' = dim V — 1, then V ' is called a hyperplane in V . 

Example 8.11 A typical example of an affine subspace is the set V' of solutions of 
the System of linear équations (1.3). If the coefficients au and constants b t of the 
System of équations (1.3) lie in the field K, then the set of solutions V' is contained 
in the set of rows K. n of length n, which we view as an affine space (K /7 , K 77 ), that 
is, V — K 77 and L = K 77 . 

For a proof of the fact that the solution set V' is an affine subspace, let us verify 
that its space of vectors L 7 is the solution space of the homogeneous System of linear 
équations associated with (1.3). That the set of solutions of a linear homogeneous 
System is a vector subspace of K 77 was established in Sect. 3.1 (Example 3.8). Let 
the rows x and y be solutions of the System (1.3), viewed now as points of the affine 
space V = K n . We must verify that the vector xy defined as in the above example 
is contained in L 7 . But in accordance with this example, we must set xy — y — x, 
and it then remains for us to verify that the row y — x belongs to the subspace L 7 , 
that is, it is a solution of the homogeneous System associated with the System (1.3). 
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It suffices to verify this property separately for each équation. Let the i th équation 
of the linear homogeneous System associated with (1.3) be given in the form (1.10), 
that is, Fj( x) — 0, where F z is some linear function. By assumption, x and y are 
solutions of the System (1.3), in particular, F/ (x) = bj and F/ (y) = bj. From this it 
follows that F, (y — x) = F z (y) — F z (x) = bj — bj = 0, as asserted. 

Example 8.12 Let us now prove that conversely, every affine subspace of the affine 
space (K 71 , K") is defined by linear équations, that is, if V' is an affine subspace, 
then V' coincides with the set of solutions of some System of linear équations. 
Since V' is a subspace of the affine space (K", K"), it follows by définition that 
the corresponding set of vectors L/ is a subspace of the vector space K 7? . We saw in 
Sect. 3.1 (Example 3.8) that it is then defined in W 1 by a homogeneous System of 
linear équations 


Fi(x) = 0, F 2 (x) = 0, ..., F m (x) = 0. (8.5) 

Let us consider an arbitrary point A e V' and set F z (A) — bj for ail i = ,m. 

We shall prove that then the subspace V' coincides with the set of solutions of the 
System 

Fi(x) = Z?i, F 2 (x) = Z? 2 , • ••, F m (x) = b m . (8.6) 

Indeed, let us take an arbitrary point B e V' . Let the points A and B hâve coordi- 
nates A = (oq , . . . , ot n ) and B — (/fi , . . . , fi n ) in some frame of reference. Then the 
coordinates of the vector AB are equal to (fi\ — ct \, . . . , — a n ), and we know 

that the point B belongs to V' if and only if the vector x — AB belongs to the sub- 
space L, that is, satisfies équations (8.5). Now using the fact that the functions F z 
are linear, we obtain that for any one of them, 

Fi(P i - ûq, ...,Pn - ot n ) = Fj(Pi , ..., p n ) - F z (ot \ , . . . , a n ) = Fj(B) - bj. 

This implies that the point B belongs to the affine subspace V' if and only if F z (B) = 
bj , that is, its coordinates satisfy équations (8.6). 

Définition 8.13 Affine subspaces V' and V" are said to be parallel if they hâve the 
same set of vectors, that is, if L = L". 

It is easy to see that two parallel subspaces either hâve no points in common or 
else coincide. Indeed, suppose that V' and V" are parallel and the point A belongs 
to V' fl V" . Since the spaces of vectors for V' and V" coincide, it follows that for 

an arbitrary point B e V', there exists a point C G V" such that AB — AC. Hence, 
taking into account the uniqueness of the point D in the relationship (8.1) from the 
définition of an affine space, it follows that B = C, which implies that V' C V" . 
Since the définition of parallelism does not dépend on the order of the subspaces V' 
and F", the opposite inclusion V" C V' holds as well, which yields that V' = V" . 
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Let V' and V" be two parallel subspaces, and let us choose in each of them a 

point: A e V' and B e V" . Setting the vector AB equal to a, we obtain, by définition 
of the translation T a , that T a (A ) = B. 

Let us consider an arbitrary point C e V' . It follows from the définition of par- 
allelism that there exists a point D e V" such that AC — BD. From this, it fol- 
lows easily that CD — AB — a\ see Fig. 8.1 and Remark 8.2. But this implies that 
T a (C) — D. In other words, T a (V') C V" . Similarly, we obtain that T- a (V") C V' , 
whence from properties 1, 2, and 3 of a translation, it follows that V" C T a (V r ). 
This implies that T a (V') = V" , that is, any two parallel subspaces can be mapped 
into each other by a translation. Conversely, it is easy to verify that affine subspaces 
F ' and T a (V') are parallel for any choice of F ' and a. 

Let us consider two different points A and B of an affine space (V, L). Then 
the totality of ail points C whose existence is established by condition (3) in the 
définition of affine space (with arbitrary scalars a) forms, as is easy to see, an affine 

subspace V' . The corresponding vector subspace L coincides with (AB). Therefore, 
L, and hence also the affine space (F', L), is one-dimensional. It is called the line 
passing through the points A and B. 

The notion of a line is related to the general notion of affine subspace by the 
following resuit. 

Theorem 8.14 In order for a subset M of an affine space V defined over a field 
of characteristic different from 2 to be an affine subspace of V, it is necessary and 
sufficient that for every two points of M, the line passing through them be entirely 
containecl in M . 


Proof The necessity of this condition is obvious. Let us prove its sufficiency. Let 

— > 

us choose an arbitrary point O e M. We need to prove that the set of vectors OA , 

where A runs over ail possible points of the set M , forms a subspace L of the 

space of vectors L of the affine space (F, L). Then for any other point B e M, the 
— > — > — > 

vector AB — O B — O A will lie in the subspace L , whence (M, L ) will be an affine 
subspace of the space (F, L). 

That the product of an arbitrary vector OA and arbitrary scalar a lies in L dérivés 

— > 

from the condition that the line (OA) is contained in L . Let us verify that the sum 

— > — > 

of two vectors a — O A and b — O B contained in L is also contained in L . For this, 
we shall need the condition that we required on the set of points of a line only for 
a = 1/2 (in order for us to be able to apply this condition, we hâve assumed that 
the field K over which the affine space F in question is defined is of characteristic 
different from 2). Let C be a point of the line passing through A and B such that 

AC = j AB. By définition, along with each pair of points A and B of the set M, the 

line passing through them also belongs to this set. Hence it follows in particular that 

— > — > 

we hâve C g M and OC G L . Let us dénoté the vector OC by c; see Fig. 8.2. Then 
we hâve the equalities 


b — O B — OA -h ~ÂB — a- h AB, 


c — OC — OAtAC — a AC , 
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and thus in our case, we hâve AB — b — a and AC — c — a, which implies c — a — 
\{b — a), that is, c — \(a + b). Consequently, the vector a + b equals 2c, and silice 
c is in L', the vector a + b is also in C . □ 

Now let Ao, Ai, . . . , A m be a collection of m + 1 points in the affine space V . 
Let us consider the subspace 

L' = (AoAi, A 0 A 2 , . . . , AoA m ) 

of the space L. It does not dépend on the choice of point Ao among the given points 

- — > 

Ao, Ai , . . . , A m , and we may write it, for example, in the form (. . . , A/ A y, . . .) for 
ail i and j , 0 < i, j < m. The set V ' of ail points B e V for which the vector 

AqB is in L' forms an affine subspace whose space of vectors is L'. By définition, 

dim V' < m, and moreover, dim V' = m if and only if dim C — m, that is, the vectors 
> > > 

A 0 A 1 , A 0 A 2 , . . . , AoA m are linearly independent. This provides the basis for the 
following définition. 

Définition 8.15 Points Aq, Ai, . . . , A m of an affine space V for which 


dim(A 0 Ai, A 0 A 2 , ..., A 0 A,„) =m 


are called independent. 

For example, the points Ao, Ai , . . . , A n (where n — dim F) détermine a frame of 
reference if and only if they are independent. Two distinct points are independent, 
as are three noncollinear points, and so on. See Fig. 8.3. 

The following theorem gives an important property of affine spaces, connecting 
them with the familiar space of elementary geometry. 

Theorem 8.16 There is a unique line passing through every pair of distinct points 
A and B of an affine space V . 
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P roof It is obvious that distinct points A and B are independent, and the line V' C V 
containing them must coincide with the set of points C e V for which AC e {AB) 
(instead of AC, one could consider the vector BC\ it détermines the same subspace 

V' C V). If AC = a AB and AC' = fi AB , then CC' — (fi — a) AB, whence it fol- 
lows that V' is a line. □ 


Having selected on any line P of the affine space V the point O (reference point) 
and arbitrary point E e P not equal to O (scale of measurement), we obtain for an 
arbitrary point A e P the relationship 

ÔA^aÔE, (8.7) 


where a is some scalar, that is, an element of the field K over which the affine space 

V under considération is defined. The assignment A i-^ a, as is easily verified, es- 

tablishes a bijection between the points A e P and scalars a. This correspondence, 

of course, dépends on the choice of points O and E on the line. In fact, we hâve here 

a spécial case of the notion of coordinates relative to a frame of reference (O; e) on 

— > 

the affine line P, where e — O E. 

As a resuit, we may associate with any three collinear points A, B, and C of an 
affine space, excepting only the case A — B — C , a scalar a, called the affine ratio of 
the points A, B, and C and denoted by (A, B, C). This is accomplished as follows. If 
A B, then a is uniquely determined by the relationship AC — a AB. In particular, 
a — 1 if B — C, and a = 0 if A — C. If A — B C, then we take a — oo. And if ail 
three points A, B, and C coincide, then their affine ratio (A, B, C) is undefined. 

Using the concept of oriented length of a line segment, we can write the affine 
ratio of three points using the following formula: 


(A, B, C) = 


AC 
AB ’ 


( 8 . 8 ) 


where AB dénotés the signed length of AB, that is, AB — \AB\ if the point A lies 
to the left of B, and AB — — \AB\ if the point A lies to the right of B. Here, of 
course, in formula (8.8), we assume that a /O = oo for every a 0. 

For the remainder of this section, we shall assume that V is a real affine space. 
In this case, obviously, the numbers a from relationship (8.7) corresponding to 
the points of the line P are real, and the relationship a < y < fi between numbers 
on the real line carries over to the corresponding points of the line Pc V . If these 
numbers a, fi, and y correspond to the points A, B, and C, then we say that the 
point C lies between the points A and B . 

Despite the fact that the relationship Ah>a defined by formula (8.7) itself dé- 
pends on the choice of distinct points O and E on the line, the property of point C 
that it lie between A and B does not dépend on that choice (although with a different 
choice of O and E, the order of the points A and B might, of course, change). In- 

deed, it is easy to verify that by replacing the point O by O', to each of the numbers 

> 

a, f, and y is added one and the same term À corresponding to the vector O O', and 
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in replacing the point E by E' , each of the numbers a, fi, and y is multiplied by 

— > > 

one and the same number /x^O such that OE — fiOE . For both operations, the 
relationship a < y < P for the point C and pair of points A and B is unchanged, 
except that the numbers a and P in this inequality may exchange places (if they are 
multiplied by /x < 0). 

The property of a point C to lie between A and B is related to the affine ratio 
for three collinear points introduced above. Namely, it is obvious that in the case of 
a real space, the inequality (C, A, B) < 0 is satisfied if and only if the point C lies 
between A and B . 

Définition 8.17 The collection of ail points on the line passing through the points 
A and B that lie between A and B together with A and B themselves is called the 
segment joining the points A and B and is denoted by [A, B]. Here the points A and 
B are called the endpoints of the segment, and by définition, they belong to it. 

Thus the segment is determined by two points A and B , but not by their order, 
that is, by définition [B, A] — [A, B]. 

Définition 8.18 A set M c V is said to be convex if for every pair of points A, B e 
M, the set M also contains the segment [A, B]. 

The notion of convexity is related to the partition of an affine space V by a 
hyperplane V' into two half-spaces, in analogy with the partition of a vector space 
into two half-spaces constructed in Sect. 3.2. In order to define this partition, let 
us dénoté by L' c L the hyperplane corresponding to V', and let us consider the 
partition L\L' = L + UL“ introduced earlier, choose an arbitrary point O' g V 7 , and 
for a point A e V \ V\ State that A g V + or A g V~ depending on the half-space 

(L + or L~) to which the vector O' A belongs. 

A simple vérification shows that the subsets and V~ thus obtained dépend 
only on the half-spaces L + and L~ and not on the choice of point O' g V' . Obvi- 
ously, V \V' = V + U V~ and V+ n V" = 0. 

Theorem 8.19 The sets and V~ are convex , but the entire set V \V' is not. 

P roof Let us begin by verifying the assertion about the set V + . Let A, B g V + . 

— > — > , 

This implies that the vectors x — O' A and y — O' B belong to the half-space L , 

that is, they can be expressed in the form 

x — ae u, y — Pe + v, a, P > 0, u, v g L', (8.9) 

— > 

for some fixed vector e £ L. Let us consider the vector z — O' C and write it in the 
form 


z — ye -h w, w g L'. 


( 8 . 10 ) 
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Assuming that the point C lies between A and B , let us prove that z G L + , that 
is, that y > 0. The given condition, that the point C lies between A and B , can 
be written with the help of an association between the points on the line passing 
through A and B and the numbers that are the coordinates in the frame of refer- 
ence (O; OE) according to formula (8.7). Although this association dépends on the 
choice of points O and E, the property itself of “lying between,” as we hâve seen, 
does not dépend on this choice. Therefore, we may choose O — A and E — B. Then 
in our frame of reference, the point A has coordinate 0, and the point B has coor- 
dinate 1. Let C hâve coordinate À. Since C g [A, B ], it follows that 0 < À < l.By 
définition, AC — XAB. But from the fact that 

ÂC = AO' + 0'C = z- jc, ÂB = ÂO' + Ô r B = y-x, 
we obtain the equality z — x = X(y — x), or equivalently, the equality 

z = (1 - k)x + ky. 

Using formulas (8.9) and (8.10), we obtain from the last equality the relationship 
y = (1 — k)a + À/3, from which, taking into account the inequalities a > 0, /3 > 0, 
and 0 < À < 1, it follows that y > 0. 

The convexity of the set V~ is proved in exactly the same way. 

We shall prove, finally, that the set V \ V' is not convex. In view of the convexity 
of V + and V~, of interest to us is only the case in which the points A and B lie in 
different half-spaces, for example, A g V + and B e V~ (or conversely, A g V~ and 
B g V + , but this case is completely analogous). The condition A g V + and B g V~ 
means that in formulas (8.9), we hâve a > 0 and P < 0. In analogy to what has gone 
before, for an arbitrary point C e [A, B], let us construct the vector z as was done 
in (8.10), and thus obtain the equality y = (1 — X)a + À/L If the numbers a and 
P are of opposite sign, an elementary computation shows that there always exists 
a number À g [0, 1] such that (1 — À)a + À/3 = 0, and this yields that C G [A, B]. 
Thus the theorem is proved in its entirety. □ 

Thus the set V + is characterized by the property that every pair of its points are 
connected by a segment lying entirely within it. This holds as well for the set V ~ . At 
the same time, no two points A g V + and B g V~ can be joined by a segment that 
does not intersect the hyperplane V'. This considération gives another définition of 
the partition V \ V' = V + U V~, one that does not appeal to vector spaces. 

Let us consider the sequence of subspaces 

Vo c Vi C V 2 C • • • C V n = y, dim Vi = i. (8.11) 

From the last condition, it follows that V/_ i is a hyperplane in V/, and this implies 
that the partition defined by V/ \ V/_ i = Vj U V~ is the partition introduced above. 

A pair of half-spaces (V/_ i , V,) is said to be directed if it is indicated which of 
two convex subsets of the set V/ \ V/_i we dénoté by V+ , and which by V~ . The 
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sequence of subspaces (8.11) is called a flag if each pair (V/_i, V/) is directed. We 
note that in a flag defined by the sequence (8.1 1), the subspace Vo has dimension 0, 
that is, it consists of a single point. This point is called the center of the flag. 


8.3 Affine Transformations 

Définition 8.20 An affine transformation of an affine space (V, L) into another 
affine space ( V 7 , L') is a pair of mappings 

f:V^V\ F: L->L', 

satisfying the following two conditions: 

(1) The mapping T : L —> L/ is a linear transformation of vector spaces L L'. 

(2) For every pair of points A, B e V, we hâve the equality 

f(A)f(B) = !F(ÂB). 

Condition (2) means that the linear transformation F is determined by the map- 
ping /. It is called the linear part of the mapping / and is denoted by A(f). In the 
sequel we shall, as a rule, indicate only the mapping f : V —> V f , since the linear 
part !F is uniquely determined by it, and we shall view the affine transformation as 
a mapping from V to V ' . 

Theorem 8.21 Affine transformations possess the following properties : 

(a) The composition of two affine transformations f and g is again an affine trans- 
formation , which we dénoté by gf . Here A(gf) = A(g)A(f). 

(b) An affine transformation f is bijective if and only if the linear transformation 
A(f) is bijective. In this case , the inverse transformation / -1 is also an affine 
transformation , and A (f ~ 1 ) = A (/)” 1 . 

(c) If f — e , the identity transformation , then A(f) = 8 . 

Proof Ail these assertions are proved by direct vérification. 

(a) Let (F, L), (F', C), and (V" , L") be affine spaces. Let us consider the affine 
transformation / : V V' with linear part !F = A( f) and another affine transfor- 
mation g : V' — ► V" with linear part $ = yl (g). We shall dénoté the composition of 
/ and g by h, and the composition of !F and $ by M. Then by the définition of the 
composition of arbitrary mappings of sets, we hâve h : V — ► V" and M : L —> L", 
and moreover, we know that M is a linear transformation. Thus we must show that 
every pair of points A, B e V satisfies the equality h(A)h(B) = M(AB). But since 
by définition, we hâve the equalities 


/ (A)f(B) = T{AB), 


*(A')s(*') = £(Tb') 
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for arbitrary points A, B eV and A', B' eV', it follows that 

h(A)h(B) = g(f(A))g{f(B)) = %(f(A)f(B)) = g(^(ÂB)) = M{ÂB). 

The proofs of assertions (b) and (c) are just as straightforward. □ 

Let us give some examples of affine transformations. 

Example 8.22 For affine spaces (L, L) and (L r , L'), a linear transformation / = T : 
L —> L' is affine, and moreover, it is obvious that A(f) = T . 

In the sequel, we shall frequently encounter affine transformations in which the 
affine spaces V and V' coincide (and this also applies to the spaces of vectors L and 
L'). We shall call such an affine transformation of a space V an affine transformation 
of the space into itself 

Example 8.23 A translation T a by an arbitrary vector a g L is an affine transfor- 
mation of the space V into itself. It follows from the définition of translation that 
A (T a ) = 8. Conversely, every affine transformation whose linear part is equal to 8 
is a translation. Indeed, by the définition of an affine transformation, the condition 

A(f) — 8 implies that f(A)f(B) — AB. Recalling Remark 8.2 and Fig. 8.1, we 

— - — > > 

see that from this assertion follows the equality Af(A) — Bf(B ), which implies 



that f — T a , where the vector a is equal to A/ (A) for some (any) point A of the 
space V . 

The same reasoning allows us to obtain a more general resuit. 

Theorem 8.24 If affine transformations f : V —> V' and g : V —> V' hâve identical 
linear parts , then they dijfer only by a translation , that is , there exists a vector a e L' 
such that g = T a f. 

> 

P roof By définition, the equality A(f) = A(g) implies that f(A)f(B) — 
1 > 

g(A)g(5) for every pair of points A, B e V . From this, the equality 

f(A)g(A) = f(B)g(B) (8.12) 

clearly follows. As in Example 8.23, this reasoning is based on Remark 8.2. The 

> 

relationship (8.12) implies that the vector f(A)g(A) does not dépend on the choice 
of the point A. We shall dénoté this vector by a. Then by the définition of trans- 
lation, g(A) = T a (f(A)) for every point A g F, which complétés the proof of the 
theorem. □ 

Définition 8.25 Let V' C V be a subspace of the affine space V . An affine trans- 
formation / : V — ► V' is said to be a projection onto the subspace V' if f(V) = V' 
and the restriction of / to V' is the identity transformation. 


8.3 Affine Transformations 


303 


Fig. 8.4 Fibers of a 
projection 



o 


Theorem 8.26 If f : V —> V' is a projection onto the subspace V' C F, then the 
preimage of an arbitrary point A' G V' is an affine subspace of V of di- 

mension dim V — dim V 7 . For distinct points A ' , A " G V 7 , the subspaces f~\A') 
and / _1 (A 7 ) are parallel. 

P roof Let !F = A(/). Then !F : L -> L' is a linear transformation, where L and L' 

are the respective spaces of vectors of the affine spaces V and V' . Let us consider 

an arbitrary point A' g V 7 and points P, Q e f~ [ (A '), that is, f(P) = f(Q) — A'. 

^ 

Then the vector f(P)f(Q) is equal to 0, whence by the définition of an affine 

> — > — > 

transformation, we obtain that f(P)f( Q) = !F(PQ) = 0, that is, the vector P Q is 

in the kernel of the linear transformation !F , which, as we know, is a subspace of L. 

Conversely, if P g f~ [ (A') and the vector x is in the kernel of the transformation 

— > 

!F, that is, F (x) = 0, then there exists a point g g T for which x — P Q. Then 

f(P) = f(Q) and Q g f~ [ (A'). By définition, an arbitrary vector x = A' B' G L' 

— > 

can be represented in the form !F(PQ ), where f(P) = A and f(Q) = B . This 
means that the image of the transformation !F coincides with the entire space L', 
whence by Theorem 3.72, we obtain 

dim f~ l (A') = dim P' -1 (0) = dim L — dim L' = dim T — dim V' , 

since P' -1 (0) is the kernel of the transformation !F, and the number dim L is equal 
to its rank; see Fig. 8.4. We hâve already proved that for every point A' g V' , the 
space of vectors of the affine space f~ [ (A') coincides with !F~ 1 (0). This complétés 
the proof of the theorem. □ 

The subspaces f~ l (A') for the points A' e V' are called fibers of the projection 
/ : V V'; see Fig. 8.4. If S' C V' is some subset (not necessarily a subspace), 
then its preimage, the set S = f~ [ (S f ), is called a cylinder in V . 

Définition 8.27 An affine transformation / : V V' is called an isomorphism if it 
is a bijection. Affine spaces V and V' in this case are said to be isomorphic. 

By assertion (b) of Theorem 8.21, the condition of a transformation / : V — ► V 
being a bijection is équivalent to the bijectivity of the linear transformation A( f) : 
L — ► L of the corresponding spaces of vectors L and L. Thus affine spaces V and 
V' are isomorphic if and only if the corresponding spaces of vectors L and L' are 
isomorphic. As shown in Sect. 3.5, vector spaces L and L are isomorphic if and 
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only if dimL = dimL', and in this situation every nonsingular linear transformation 
L —> L ' is an isomorphism. This yields the following assertion: affine spaces V and 
V ' are isomorphic if and only if dim V = dim V f . Here every affine transformation 
/ : V V' whose linear part A(f) is nonsingular is an isomorphism between V 
and V' . We shall frequently call an affine transformation / with nonsingular linear 
part A(f) nonsingular. 

From the définitions, we immediately obtain the following theorem. 

Theorem 8.28 The affine ratio (A, B , C) ofthree collinear points does not change 
under a nonsingular affine transformation. 

P roof By définition, the affine ratio a = (A, B, C) of three points A, B, C under 
the condition B is defined by the relationship 


AC = a AB. 


(8.13) 


Let / : V —> V be a nonsingular affine transformation and F : L — > L its corre- 
sponding linear transformation. Then in view of the nondegeneracy of the transfor- 
mation /, we hâve / (A) 7 ^ / (B) and 

f(A)f(C) = F(ÂC), f(A)f(B) = F(ÂB), 


and f = (/(A), f(B), /(C)) is defined by the equality f(A)f(C) = Pf(A)f(B ), 
that is, 


F(AC) = pF(AB). 


(8.14) 


Applying the transformation F to both sides of equality (8.13), we obtain F (AC) = 
a F (AB), whence taking into account equality (8.14), it follows that P — a. In the 
case that A = B C , we obtain, in view of the nonsingularity of /, the analo- 
gous relationship f(A) = f(B) f(C), from which we hâve (A, B, C) — 00 and 
(/(A),/(B),/(C)) = 00 . □ 


Example 8.29 Every affine space (V, L) is isomorphic to the space (L, L). Indeed, 

let us choose in the set V an arbitrary point O and define the mapping f : V — >• L in 

— > 

such a way that /(A) = OA. It is obvious, by the définition of affine space, that the 
mapping / is an isomorphism. 

Let us note that the situation here is similar to that of an isomorphism of a vector 
space L and the dual space L*. In one case, the isomorphism requires the choice of 
a basis of L, while in the other, it is the choice of a point O in F. 

Let / : V — ► V' be an affine transformation of affine spaces ( V , L) and (V' , L'). 
Let us consider isomorphisms cp : V ^ L and (p r : V r — ► L', defined, as in Exam- 
ple 8.29, by the sélection of certain points O e V and O' g V' . We hâve the map- 
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pings 

y — f —+ w 



L > L' 

T 


(8.15) 


where !F = A(f). Here, generally speaking, we cannot assert that !F cp — cp' f, but 
nevertheless, these mappings are closely related. For an arbitrary point A e V, we 

hâve by construction that cp(A) = OA and !F (cp(A)) = !F(OA) = f(0)f(A). In 

just the same way, <p'(f (A)) = O'f(A). Finally, O'f(A) = O'f(o) + f(0)f(A). 
Combining these relationships, we obtain 


cp' f — T b !F(p, where b = O'f(O). (8.16) 


Relationship (8.16) allows us to write down the action of affine transformations 
in coordinate form. To do so, we choose frames of reference (O; e ,e n ) and 
(O ' , e \, . . . , e' m ), where n = dim V and m = dim V\ in the spaces V and V ' . Then 
the coordinates of the point A in the chosen frame of reference are the coordinates of 

the vector OA — (p(Â) in the basis e\ , . . . , e n . Likewise, the coordinates of the point 

> 

f(A) are the coordinates of the vector O' /(A) = ( p'(f(A )) in the basis e \ , . . . , e' m . 

— > 

Let us make use of relationship (8.16). Suppose the coordinates of the vector OA 

> 

in the basis e \ , . . . , e n are (ai , . . . , a n ), the coordinates of the vector O' f (A) in the 
basis e ' { , . . . , e' m are , . . . , a' m ), and the matrix of the linear transformation !F in 
these bases is F — (///). Setting the coordinates of the vector b from formula (8.16) 
in the basis e\ , . . . , e' m equal to (P\ , . . . , /? m ), we obtain 

n 

ol'i — ^ fijotj + pi, i = 1, . . . , m. (8.17) 

j = 1 


Using the standard notation for column vectors 


\\ 

- M = 

/VA 

W 


vw 


we may rewrite formula (8.17) in the form 

[a'] = F[a] + [p]. 



(8.18) 


The most frequent case that we shall encounter in the sequel is that of transfor- 
mations of an affine space V into itself. Let us assume that the mapping / : V — ► V 
has a fixed point O, that is, for the point O e V, we hâve f(0) = 0. Then the trans- 
formation / can be identified with its linear part, that is, if by the choice of affine 
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space V, the frame of reference (O; e ,e n ) with fixed point O identifies V with 

the vector space L, then the mapping f is identified with its linear part !F = A(f). 

Here f{0) — O and Of(A) = !F (ÔA) for every point A eV. 

We shall call such affine transformations of a space V into itself linear (we note 

that this notion dépends on the choice of point O e V that / maps to itself). If for an 

arbitrary affine transformation / we define /o = T~ l /, where the vector a is equal 
> 

to O f (O), then /o will be a linear transformation, and we obtain the représentation 

f = T a f 0 . (8.19) 

It is obvious that a nonsingular affine transformation of the space (V, L) takes each 
frame of reference (O; e \ , . . . , e n ) into some other frame of reference. This implies 
that if f(O) = O ' and A(f)(ei) — e' j9 then (O'; e' n ) is also a frame of refer- 

ence. Conversely, if the transformation / takes some frame of reference to another 
frame of reference, then it is nonsingular. 

From the représentation (8.19) we obtain the following resuit. 

If we are given a frame of reference (O; e\, . . . , e n ), an arbitrary point O' , and 
vectors a \ , . . . , a n in L, then there exists (and it is unique) an affine transformation 

f mapping O to O' such that A (/)(£/) = for ail i — 1, . . . , n. To prove this, we 

> 

set a equal to 00' in représentation (8.19), and for /o, we take a linear transfor- 
mation of the vector space L into itself such that fo(^i) — ci i for ail i = 1, . . . , n. 
It is obvious that the affine transformation f thus constructed satisfies the requisite 
conditions. Its uniqueness follows from the représentation (8.19) and from the fact 
that the vectors e \ , . . . , e n form a basis of L. 

The following reformulation of this statement is obvious: if we are given n + 1 
independent points Ao, Ai, . . . , A n of an n-dimensional affine space V and an ad- 
ditional arbitrary n + 1 points Bq, B \ , . . . , B n , then there exists (and it is unique) an 
affine transformation f : V — ► V such that /(A/) = B t for ail i = 0, 1, . . . , n. 

In the sequel, it will be useful to know about the dependence of the vector a 
in représentation (8.19) on the choice of point O (on its choice also dépends the 

transformation fy of the space V, but as a transformation of a vector space L, it 

> 

coincides with A(/)). Let us set 00' — c. Then for a new choice of O' as fixed 
point, we hâve, similar to (8.19), the représentation 

/ = 7a'/o> (8-20) 

> 

where f^(O') — O' and the vector a' is equal to O' f(O'). By well-known rules, we 
hâve 


a’ = o'f(o) = Tÿo + Of(O'), 
Of{0') = Ôfîo) + fiO)f(O') =a + F(C). 
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> > 

Since O' O — —00', we obtain that the vectors a and a' in représentations (8.19) 
and (8.20) are related by 

a' = a + !F(c) — c, wher tc—OO'. (8.21) 

Let us choose a frame of reference in the affine space ( V, L). Let us recall that it 

— > 

is written in the form ( O ; e \, . . . , e n ) or (O; Ai, ... , A n ), where et = OAj. Let / 
be a nonsingular transformation of V into itself, and let it map the frame of reference 

(O; e \, . . . , e n ) to (0\ e \, . . . , e' n ). If e = O' AJ, then this implies that f(O) = O' 
and /(A/) = A' t for i = 1, . . . , n. 

Let the point A e V hâve coordinates (oq , . . . , a n ) relative to the frame of refer- 

— > 

ence (O; Ai , . . . , A n ). This means that the vector OA is equal to a\e\ H \-a n e n . 

> — > 

Then the point /(A) détermines the vector f(O) /(A), that is, F {O A). And this 

— > 

vector obviously has, in the basis e \, . . . , e n , the same coordinates as the vector OA 
in the basis e \, . . . , e n , since by définition, e\ — !F (e/). Thus the affine transforma- 
tion / is defined by the fact that the point A is mapped to a different point /(A) 
having in the frame of reference (O', e ' { , . . . , e' n ) the same coordinates as the point 
A had in the frame of reference (O; e \ , . . . , e n ). 

Définition 8.30 Two subsets S and S' of an affine space V are said to be qffinely 
équivalent if there exists a nonsingular affine transformation / : V — ► V such that 
f(S) = S'. 

The previous reasoning shows that this définition is équivalent to saying that in 
the space V , there exist two frames of reference (O; e \ , . . . , e n ) and (O'; e ' { , . . . , e' n ) 
such that ail points of the set S hâve the same coordinates with respect to the first 
frame of reference as the points of the set S' hâve with respect to the second. 

In the case of real affine spaces, the définition of affine transformations by for- 
mulas (8.17) and (8.18) makes it possible to apply to them Theorem 4.39 on proper 
and improper linear transformations. 

Définition 8.31 A nonsingular affine transformation of a real affine space V to itself 
is said to be proper if its linear part is a proper transformation of the vector space. 
Otherwise, it is called improper. 

Thus by this définition, we consider translations to be proper transformations. 
A bit later, we shall provide a more meaningful justification for this définition. 

By the given définition of affine transformation, whether / is proper or improper 
dépends on the sign of the déterminant of the matrix F — (fij) in formulas (8.17), 
(8.18). We observe that this concept relates only to nonsingular transformations V, 
since in formulas (8.17) and (8.18), we must hâve m = n. 

In order to formulate an analogue to Theorem 4.39, we should refine the sense 
of the assertion about the fact that the family g(t) of affine transformations dépends 
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continuously on the parameter t. By this, we shall understand that for g(t), in the 
formula 


analogous to (8.17), written in some (arbitrarily chosen) frame of reference of the 
space y, ail coefficients gij(t) and Pi (t) dépend continuously on t. In particular, if 
G(t ) — ( gjj(t )) is a matrix of the linear part of the affine transformation g(t), then 
its déterminant \G(t)\ is a continuous function. From the properties of continuous 
functions, it follows that the déterminant \G(t)\ has the same sign at ail points of 
the interval [0, 1]. 

Thus we shall say that an affine transformation / is continuously déformable 
into h if there exists a family g{t) of continuous affine transformations, depending 
continuously on the parameter t e [0, 1], such that g(0) = / and g(l) = h. It is 
obvious that the property thus defined of affine transformations being continuously 
déformable into each other defines on the set of such transformations an équivalence 
relation, that is, it satisfies the properties of reflexivity, symmetry, and transitivity. 

Theorem 8.32 Two nondegenerate a ffine transformations of a real affine space are 
continuously déformable into each other if and only ifthey are either both proper or 
both improper. In particular , ; a nonsingular affine transformation f is proper if and 
only if it is déformable into the identity. 

Proof Let us begin with the latter, more spécifie, assertion of the theorem. Let a 
nonsingular affine transformation / be continuously déformable into e. Then by 
symmetry, there exists a continuous family of nonsingular affine transformations 
g(t) with linear part A(g(t)) such that g(0) — e and g(l) = /. For the transfor- 
mation g(t), let us write (8.22) in some frame of reference ( O ; e \, . . . , e n ) of the 
space y. It is obvious that for the matrix G(t) = ( gij(t )), we hâve the relation- 
ships Gif)) — E and G(l) = F, where F is the matrix of the linear transformation 
F — A(f) in the basis e\ , . . . , e n of the space L and pi (0) = 0 for ail i — 1 , . . . , n. 
By the définition of continuous deformation, the déterminant \G(t)\ is nonzero for 
ail t e [0, 1]. Since |G(0)| = \E\ — 1, it follows that \G(t)\ > 0 for ail t e [0, 1], and 
in particular, for t = 1. And this means that \A(f)\ — | G ( 1) | > 0. Thus the linear 
transformation A(f) is proper, and by définition, the affine transformation / is also 
proper. 

Conversely, let / be a proper affine transformation. This means that the linear 
transformation A(f) is proper. Then by Theorem 4.39, the transformation A(f) is 
continuously déformable into the identity. Let fy(t) be a family of linear transfor- 
mations such that $(0) = 8 and $(1) = A(f), given in some basis e\ , . . . , e n of the 
space L by the formula 


n 



( 8 . 22 ) 


n 



(8.23) 
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where gijit) are continuous functions, the matrix G(t) = (g, /(O) is nonsingular for 
ail t g [0, 1], and we hâve the equalities G(0) = E , G(t) = F , where F is the matrix 
of the transformation A(f) in the same basis e\ , . . . , e n . 

Let us consider the family g(t ) of affine transformations given in the frame of 
reference (O; e \ , . . . , e n ) by the formula 

n 

Oi'i = y ^gij(t)aj +Pit, i = 1, 

7=1 

in which the coefficients of gij(t) are taken from formula (8.23), while the coeffi- 
cients ^ are from formula (8.17) for the transformation / in the same frame of refer- 
ence ( O ; e \, . . . , e n ). Silice $(0) = 8 and $(1) = A(/ ), it is obvious that g(0) = e 
and g(l) = /, and moreover, |G(f)| > 0 for ail î g [0, 1], that is, the transformation 
g (t) is nonsingular for ail t g [0, 1]. 

From this it follows by transitivity that every pair of proper affine transformations 
are continuously déformable into each other. 

The case of improper affine transformations is handled completely analogously. 
It is necessary only to note that in ail the arguments above, one must replace 
the identity transformation 8 by some fixed improper linear transformation of the 
space L. □ 

Theorem 8.32 shows that analogously to real vector spaces, in every real affine 
space there exist two orientations, from which we may select arbitrarily whichever 
one we wish. 


8.4 Affine Euclidean Spaces and Motions 

Définition 8.33 An affine space (V, L) is called an affine Euclidean space if the 
vector space L is a Euclidean space. 

This means that for every pair of vectors x, y e L there is defined a scalar product 
(je, y) satisfying the conditions enumerated in Sect. 7.1. In particular, (x, x) >0 for 
ail x g L and there is a définition of the length |x| = y/(x, x) of a vector x. Silice 

every pair of points A, B e V defines a vector AB g L, it follows that one can 

associate with every pair of points A and B , the number 

r(A,B) = \ÂB\, 

called the distance between the points A and B in V. This notion of distance that 
we hâve introduced satisfies the conditions for a me trie introduced on p. xvii: 

(1) r (A, B) > 0 for A ^ B and r(A, A) = 0; 

(2) r(A, B) = r(B , A) for every pair of points A and B ; 
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(3) for every three points A, B, and C, the triangle inequality is satisfied: 

r(A, C) < r(A, B) + r(B, C). (8.24) 

Properties (1) and (2) clearly follow from the properties of the scalar product. Let 
us prove inequality (8.24), a spécial case of which (for right triangles) was proved 

on p. 216. By définition, if AB — x and BC — y, then (8.24) is équivalent to the 
inequality 

\x + y\ < \x\ + \y\. (8.25) 

Since there are nonnegative numbers on the left- and right-hand sides of (8.25), we 
can square both sides and obtain an équivalent inequality, which we shall prove: 

I* + y \ 2 < ( 1*1 + IjI) 2 - (8.26) 


Since 

I* + j| 2 = (x + y, x + y) = |x| 2 + 2(x, y) + | j| 2 , 

then after multiplying out the right-hand side of (8.26), we can rewrite this inequality 
in the form 

|x| 2 + 2(x, j) + \y\ 2 < |x| 2 + 2|x| • \y\ + |j| 2 . 

Subtracting like terms from the left- and right-hand sides, we arrive at the inequality 

(x,y) < \x\ ■ |j|, 

which is the Cauchy-Schwarz inequality (7.6). 

Thus an affine Euclidean space is a metric space. 

In Sect. 8.1, we defined a frame of reference of an affine space as a point O in 
V and a basis e\, ... ,e n in L. If our affine space (V, L) is a Euclidean space, and 
the basis ei, ... , e„ is orthonormal, then the frame of reference e [, ... ,e n ) is 

also said to be orthonormal. We see that an orthonormal frame of reference can be 
associated with each point O e V. 

Définition 8.34 A mapping g : V — >• V of an affine Euclidean space V into itself is 
said to be a motion if it is an isometry of V as a metric space, that is, if it préserves 
distances between points. This means that for every pair of points A, B g V, the 
following equality holds: 


r(g(A),g(B))=r(A,B). (8.27) 

Let us emphasize that in this définition, we are speaking about an arbitrary map- 
ping g : V — >• L, which in general, does not hâve to be an affine transformation. By 
the discussion presented on p. xxi, a mapping g : V V is a motion if its image 
g(V) = V also satisfies the condition (8.27) of preserving distances. 
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Example 8.35 Let a be a vector in the vector space L corresponding to the affine 
space V . Then the translation T a is a motion. Indeed, by the définition of a transla- 
tion, for every point A e V we hâve the equality T a (A) = B , where AB — a. If for 

— > 

some other point C, we hâve an analogous equality T a (C) = D, then CD — a. By 
condition (2) in the définition of an affine space, we hâve the equality AB — CD, 
from which, by Remark 8.2, it follows that AC — BD. This means that \AC\ — 
\BD\, or equivalently, r(A, C) = r(T a (A), T a (C)), as asserted. 

Example 8.36 Let us assume that the mapping g : V — ► V has the fixed point O, 
that is, the point O e V satisfies the equality g (O) = O. As we saw in Sect. 8.3, the 
choice of point O détermines a bijective mapping V —> L, where L is the space of 

vectors of the affine space V . Here to a point A e V corresponds the vector OAe L. 

Thus the mapping g : V V defines a mapping ^ : L — > L such that $( 0 ) = 0 . 
Let us emphasize that since we did not assume that the mapping g was an affine 
transformation, the mapping $, in general, is not a linear transformation of the 
space L. Now let us check that if $ is a linear orthogonal transformation of the 
Euclidean space L, then g is a motion. 

— > > 

By définition, the transformation $ is defined by the condition $(0 A) = O g (A). 
We must prove that g is a motion, that is, that for ail pairs of points A and B, we 
hâve 


\g(A)g(B)\ = \ÂB\. 


(8.28) 


We hâve the equality AB — O B — O A, and we obtain that 


g(A)g(B) = g(A)0 + Og(B) = O g (B) - Og(A), 


and this vector, by the définition of the transformation $, is equal to fy(OB) — 

fy(OA). In view of the fact that the transformation $ is assumed to be linear, this 

vector is equal to %{OB — OA). But as we hâve seen, O B — OA — AB, and this 
means that 


g(A)g(B) = %(AB). 

From the orthogonality of the transformation $ it follows that \§>(AB) \ = \AB\. In 
combination with the previous relationships, this yields the required equality (8.28). 

The concept of motion is the most natural mathematical abstraction correspond- 
ing to the idea of the displacement of a solid body in space. We may apply to the 
analysis of this ail of the results obtained in the preceding chapters, on the basis of 
the following fundamental assertion. 

Theorem 8.37 Every motion is an affine transformation. 


P roof Let / be a motion of the affine Euclidean space L. As a first step, let us 

> 

choose in V an arbitrary point O and consider the vector a — O f (O) and mapping 
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g = T- a f of the space V into itself. Here the product T- a f, as usual, dénotés 

sequential application (composition) of the mappings / and T- a . Then O is a fixed 

point of the transformation g, that is, g (O) = O. Indeed, g (O) = T- a (f(0)), and 

. _ — _ — ^ 

by the définition of translation, the equality g (O) = O is équivalent to f(0)0 — 

> 

—a, and this clearly follows from the fact that a — Of(O). 

We now observe that the product (that is, the sequential application, or compo- 
sition) of two motions gi and gi is also a motion; the vérification of this follows at 
once from the définition. Since we know that T a is a motion (see Example 8.35), it 
follows that g is also a motion. We therefore obtain a représentation of / in the form 
f — T a g, where g is a motion and g (O) — O. Thus as we saw in Example 8.36, g 
defines a mapping $ of the space L into itself. The main part of the proof consists in 
verifying that $ is a linear transformation. 

We shall base this vérification on the following simple proposition. 

Lemma 8.38 Assume that we are given a mapping $ of a vector space L into itself 
and a basis e\, ... ,e n of L. Let us set $(e;) = e'^ i = 1, . . . , n, and assume that for 
every vector 

x — a\e\-\ \-ot n e n , (8.29) 

its image 

&(x) — u\e\ H Va„e’ n (8.30) 

has the same oq , . . . , a n . Then $ is a linear transformation. 


Proof We must verify two conditions that enter into the définition of a linear trans- 
formation: 

(a) %(x + y) = %(x) + $(j), 

(b) $(ûa:) =a%(x), 

for ail vectors x and y and scalar a. 

The vérification of this is trivial, (a) Let the vectors x and y be given by x = 
a\e i H +a„e n and y — f\e\ H -\- f n e n . Then their sum is given by 

x + y = (ai + Pi)ei H b (a„ + )e n - 

On the other hand, by the condition of the lemma, we hâve 

%( x + y) = (o?i + P\)e\ H b (oi n + Pn)c' n 

= (pt\ e 1 H b otn^n) + {Pl e \ 3 b A i e ' n ) = $( x ) + $00- 

(b) For the vector x = ot\e\ H b ot n e n and an arbitrary scalar a , we hâve 

ax = (aoe\)e\ H b (pta n )e n . 

B y the condition of the lemma, 

fy(ax) — (aa\)e\ H b ( ota n )e' n =a{a\e\ H = a^(x). □ 
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We now return to the proof of Theorem 8.37. Let us verify that the above con- 
struction of the mapping $ : L —> L satisfies the condition of the lemma. To this end, 
let us first ascertain that it préserves the inner product in L, that is, that for ail vectors 
x, y e L, we hâve the equality 

(£(*), £O0) = (*, J). (8.31) 


Let us recall that the property for the transformation g to be a motion can be 
formulated as the following condition on a transformation $ of a vector space L: 

|$(*)-£O0| = l* — J-l (8.32) 

for ail pairs of vectors x and y. Squaring both sides of equality (8.32), we obtain 

\&{x)-%{y)\ 2 = \x-y\ 2 . (8.33) 


Since x and y are vectors in the Euclidean space L, we hâve 

|*-.y| 2 = |-r| 2 -2(x,.y)+l.y| 2 , 

|8«*) - &O0| 2 = |£(*)| 2 - 2(&(x), 300) + \%(y)\ 

Putting these expressions into equality (8.33), we find that 


|£(x)| 2 - 2(£(x), £00) + \%(y)\ 2 = l*l 2 - 2(x, y) + |y| 


(8.34) 


Setting the vector y equal to 0 in relationship (8.34), and taking into account that 
$( 0 ) = 0 , we obtain the equality |$(x)| = |x| for ail x e L. Finally, taking into 
account the relationships |$(x)| = |x| and \fy(y)\ — |y|, from (8.34) follows the 
required equality (8.31). 

Thus for any orthonormal basis e \ , . . . , e n , the vectors e ' { , . . . , e ' n , defined by the 
relationships %{e{) — e'- r also form an orthonormal basis, in which the coordinates 

of the vector x — x\e\ H 1- x n e n are given by the formula x,- = (x, ej ). From this 

we obtain that ($(x), e' t ) = x/, and this implies that 

$(x) —xie\ H t-x n e' n , 


that is, the constructed mapping % : L L satisfies the condition of the lemma. 
From this it follows that $ is a linear transformation of the space L, and by property 
(8.31), it is an orthogonal transformation. □ 


Let us note that along the way, we hâve proved the possibility of expressing an 
arbitrary motion f in the form of the product 

f = T a g, (8.35) 

where T a is a translation, and g has a fixed point O and corresponds to some orthog- 
onal transformation $ of the space L (see Example 8.36). From the représentation 
(8.35) and results of Sect. 8.3, it follows that two orthonormal frames of reference 
can be mapped into each other by a motion, and moreover, it is unique. 
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For studying motions, we may make use of the structure of orthogonal transfor- 
mations already investigated in Sect. 7.2, that is, Theorem 7.27. By this theorem, for 
every orthogonal transformation, in particular, for the transformation $ associated 
with the motion g in formula (8.35), there exists an orthonormal basis in which the 
matrix of the transformation G is in block-diagonal form: 



1 



0 


\ 


\ 


0 



(8.36) 


where 


( COS (pi 

sin (pj 


- sin (pi 

COS (pi 


(8.37) 


and (pi ^ irk, k e Z. Two instances of the number —1 on the main diagonal of the 
matrix (8.36) can be substituted by the matrix G ^ of the form (8.37) with (p — n , 
so that is possible to assume that in the matrix (8.36), the number — 1 is absent 
or is encountered exactly one time, and in this case, 0 < (pi <2 tt . Under such a 
convention, we obtain that if the transformation $ is proper, then the number — 1 
does not appear on the main diagonal, while if $ is improper, there is exactly one 
such occurrence. 

From the aforesaid, it follows that in the case of a proper transformation $> of the 
space L of dimension n, we hâve the orthogonal décomposition 


L = Lo 0 Li ® • • • ® U, where L/ _L L / for ail i / j, (8.38) 

where ail subspaces Lo, . . . , L& are invariant with respect to the transformation $, 
and dimLo — n — 2k, dimL/ —2 for ail i = 1, . . . , k. The restriction of $ to Lo 
is the identity transformation, while the restriction of $ to the subspace L, with 
i = 1 , . . . , k is a rotation through the angle (pi . 

But if the transformation $ is improper, then on the main diagonal of the ma- 
trix (8.36) the number — 1 is encountered once. Then in the orthogonal décomposi- 
tion (8.38), there is added one additional one-dimensional term L^+i, in which the 
transformation $ takes each vector x to the opposite vector —x. The orthogonal 
décomposition of the space L into a sum of subspaces invariant with respect to the 
transformation $ takes the form 


L = Lo ® Li ® • • • ® Lk 0 La + 1 , where L / _L L j for ail i ^ j , (8.39) 

where dim L/ = 2 for i = 1 , . . . , k, dim Lq — n — 2k— 1 , and dim L^ + i = 1 . 
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Now we shall make use of the arbitrariness in the sélection of O in the représen- 
tation (8.35) of the motion /. By formula (8.21), for a change in the point O, the 
vector a in (8.35) is replaced by the vector a + fy(c) — c, where for c, one can take 
an arbitrary vector of the space L. We hâve the représentation 

c — cq -\- c\ H- • • • + Ck, Ci g L/, (8.40) 

in the case of the décomposition (8.38), or else we hâve 

c — cq-\-c\-\- bCfc + Cfc+i, Ci G L/, (8.41) 

in the case of the décomposition (8.39). 

Since $(x) — x for every vector x g Lq, the term co rnakes no contribution to 
the vector $(c) — c added to a. For i > 0, the situation is precisely the reverse: 
the transformation $ — S defines a nonsingular transformation in L / . This follows 
from the fact that the kernel of the transformation $ — 8 is equal to (0), which is 
obvious for a rotation through the angle (pi, 0 < (pi < 2 jt , in the plane and for the 
transformation on a line. Therefore, the image of the transformation $ — 8 in 
L / is equal to the entire subspace L ; for i > 0. That is, every vector a / g L, can be 
represented in the form a, = $(c,) — c,-, where c, is some other vector of the same 
space L/, i > 0. 

Thus in accordance with the représentations (8.40) and (8.41), the vector a can 

be written in the form a — ao -\- a\ H h ox a = ao + a\ + h a* + cik+ 1 , 

depending on whether the transformation $ is proper or improper. We may set = 
$(c z ) — Ci, where the vectors Cj are defined respectively by relationship (8.40) or 
(8.41). As a resuit, we obtain the equality 

0 + 9>(c) -c = a 0 , 

meaning that by our sélection of the point O, we can obtain that the vector a is 
contained in the subspace l_o. 

We hâve thus proved the following theorem. 

Theorem 8.39 Every motion f of an affine Euclidean space V can be represented 
in the form 

f = T a g , (8.42) 

where the transformation g has fixed point O and corresponds to the orthogo- 
nal transformation $ = A (g), while T a is a translation by the vector a such that 
9’(a) = a. 

Let us consider the most visual example, that of the “physical” three-dimensional 
space in which we live. Here there are two possible cases. 

Case 1 : The motion / is proper. Then the orthogonal transformation g; : L — > L is 
also proper. Since dimL = 3, the décomposition (8.38) has the form 

L « -LL;, 


L = Lq ® Li, 
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Fig. 8.5 A proper motion 



where diml_o = 1 and dimLi = 2. The transformation $ leaves vectors in l_o fixed 
and defines a rotation through the angle 0 < <p < 2n in the plane Li. Représentation 
(8.42) shows that the transformation f can be obtained as a rotation through the 
angle cp about the line Lq and a translation in the direction of l_o; see Fig. 8.5. 

This resuit can be given a different formulation. Suppose a solid body executes an 
arbitrarily complex motion over time. Then its initial position can be superimposed 
on its final position by a rotation around some axis and a translation along that 
axis. Indeed, since it is a solid body, its final position is obtained from the initial 
position by some motion /. Since this change in position is obtained as a continuons 
motion, it follows that it is proper. Thus we may employ the three-dimensional case 
of Theorem 8.39. This resuit is known as Euler’s theorem. 

Case 2: The motion / is improper. Then the orthogonal transformation : L — > L is 
also improper. Since dimL = 3, the décomposition (8.39) has the form 

L = Lo ® Li ® l_2, L ,• _L L j , 

where Lo = (0), dim Li = 2, and dim L 2 = 1. The transformation $ defines a rotation 
through the angle 0 < (p <2n in the plane Li and carries each vector on the line L 2 
into its opposite. From this it follows that the equality $(«) = a holds only for 
the vector a — 0 , and therefore, the translation T a in formula (8.42) is equal to the 
identity transformation. Therefore, the motion / always has the fixed point O, and 
can be obtained as a rotation through the angle 0 < <p < 27T in the plane Li passing 
through this point followed by a reflection in the plane Li . 

The theory of motions in an affine Euclidean space can be given a more graphical 
form if we employ the notion of flags, which was introduced in Sect. 8.2 (p. 300). 
First, it is clear that a motion of a space carries a flag to a flag. The main resuit, 
which we in fact hâve already proved, can be formulated as follows. 

Theorem 8.40 For every pair of flags, there exists a motion taking the first flag to 
the second , and such a motion is unique. 

Proof To prove the theorem, we observe that for an arbitrary flag 


F 0 C Fi C • • • C V n = F, 


( 8 . 43 ) 
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the affine subspace Vo consists by définition of a single point. Setting Vo = O, we 
may identify each subspace V/ with the subspace L / c L, where L / is the space of 
vectors of the affine space V; . Here the sequence 

Lo C Li C • • • C L n = L (8.44) 

defines a flag in L. On the other hand, we saw in Sect. 7.2 that the flag (8.44) 
is uniquely associated with an orthonormal basis e\,...,e n in L. Thus L/ = 
(e \ ,...,£/) and et e L+, as established in Sect. 7.2. This means that the flag (8.43) is 
uniquely determined by some orthonormal frame of reference (O; e \, . . . , e„) in V. 
As we noted above, for two orthonormal frames of reference, there exists a unique 
motion of the space V taking the first frame of reference to the second. This holds, 
then, for two flags of the form (8.43), which proves the assertion of the theorem. □ 

The property proved in Theorem 8.40 is called “free mobility” of an affine Eu- 
clidean space. In the case of three-dimensional space, this assertion is a mathemati- 
cal expression of the fact that in space, a solid body can be arbitrarily translated and 
rotated. 

In an affine Euclidean space, the distance r(A, B) between any two points does 
not change under a motion of the space. In a general affine space it is impossible to 
associate with each pair of points a number that would be invariant under every non- 
singular affine transformation. This follows from the fact that for an arbitrary pair of 
points A, B and another arbitrary pair A', B ' , there exists an affine transformation 
/ taking A to A' and B to B' . 

To prove this, let us write down a transformation / according to formula (8.19) 
in the form f — T a fo , choosing the point A as the point O. Here A is a fixed point 
of the affine transformation /o, that is, /o(A) = A. The transformation /o is defined 
by some linear transformation of the space of vectors L of our affine space V and is 
uniquely defined by the relation 

A/ 0 (C) = !F (ÂC), CeV. 

— > 

Then the condition /(A) = A' will be satisfied if we set a = AA'. It remains to 
select a linear transformation !F : L —> L so as to satisfy the equality f(B) = B ' , 
that is, T a fo(B) = B' , which is équivalent to the relationship 

/o(5) = 7'_ fl (5 , )• (8.45) 

We set the vector x equal to AB (under the condition A B, whence x ^ 0) and 

consider the point P — T- a (B') and vector y — AP . Then the relationship (8.45) is 
équivalent to the equality !F (x) = y. It remains only to find a linear transformation 
!F : L —> L for which the condition !F (x) = y is satisfied for given vectors x and y, 
with x ^ 0. For this, we must extend the vector x to a basis of the space L and define 
!F in terms of the vectors of this basis arbitrarily, provided only that the condition 
!P (. x ) — y is satisfied. 


Chapter 9 

Projective Spaces 


9.1 Définition of a Projective Space 

In plane geometry, points and Unes in the plane play very similar rôles. In order to 
emphasize this symmetry, the fundamental property that connects points and Unes 
in the plane is called incidence , and the fact that a point A lies on a line / or that 
a line / passes through a point A expresses in a symmetric form that A and / are 
incident. Then one might hope that to each assertion of geometry about incidence 
of points and Unes there would correspond another assertion obtained from the first 
by everywhere interchanging the words “point” and “line.” And such is indeed the 
case, with some exceptions. For example, to every pair of distinct points, there is 
incident one and only one line. But it is not true that to every pair of distinct Unes, 
there is incident one and only one point: the exception is the case that the Unes are 
parallel. Then not a single point is incident to the two Unes. 

Projective geometry gives us the possibility of eliminating such exceptions by 
adding to the plane certain points called points at infinity. For example, if we do 
this, then two parallel Unes will be incident at some point at inhnity. And indeed, 
with a naive perception of the external world, we “see” that parallel Unes moving 
away from us converge and intersect at a point on the “horizon.” Strictly speaking, 
the “horizon” is the totality of ail points at inhnity by which we extend the plane. 

In analyzing this example, we may say that a point p of the plane seen by us 
corresponds to the point where the Une passing through p and the center of our 
eye meets the retina. Mathematically, this situation is described using the notion of 
central projection. 

Let us assume that the plane 77 that we are investigating is contained in three- 
dimensional space. Let us choose in this same space some point O not contained 
in the plane 77. Every point A of the plane 77 can be joined to O by the Une OA. 
Conversely, a Une passing through the point O intersects the plane 77 in a certain 
point, providecl that the line is not parallel to II. Thus most straight Unes passing 
through the point O correspond to points A e 77. But Unes parallel to 77 intuitively 
correspond precisely to points at infinity of the plane 77, or “points on the horizon.” 
See Fig. 9.1. 
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Fig. 9.1 Central projection 



We shall make this notion the basis of the définition of projective space and shall 
develop it in more detail in the sequel. 

Définition 9.1 Let L be a vector space of finite dimension. The collection of ail 
fines (je), where x is a nonnull vector of the space L, is called a projectivization of 
L or projective space P(L). Here the fines (x) themselves are called points of the 
projective space P(L). The dimension of the space P(L) is defined as the number 
dimP(L) = dim L — 1. 

As we saw in Chap. 3, ail vector spaces of a given dimension n are isomorphic. 
This fact is expressed by saying that there exists only one theory of n-dimensional 
vector spaces. In the same sense, there exists only one theory of n-dimensional 
projective space. 

We shall frequently dénoté the projective space of dimension n by P" if we hâve 
no need of indicating the ( n + l)-dimensional vector space on the basis of which it 
was constructed. 

If dimP(L) = 1, then P(L) is called the projective line , and if dimP(L) = 2, then 
it called the projective plane. Lines in an ordinary plane are points on the projective 
line, while fines in three-dimensional space are points in the projective plane. 

And as earlier, we give the reader the choice whether to consider L a real or 
complex space, or even to consider it as a space over an arbitrary field K (with 
the exception of certain questions related specifically to real spaces). In accordance 
with the définition given above, we shall say that dimP(L) = — 1 if dimL = 0. In 
this case, the set P(L) is empty. 

In order to introduce coordinates in a space P(L) of dimension n, we choose a 
basis eo, e\, . . . , e n in the space L. A point A e P(L) is by définition a line (x), 
where x is some nonnull vector in L. Thus we hâve the représentation 

x = oiQeç) -\- oc\e\ H Va n e n . (9.1) 

The numbers (ao, oq, . . . , a n ) are called homogeneous coordinates of the point A. 
But the point A is the entire line (x). It can also be obtained in the form (y) if 
y = Xx and À ^ 0. Then 


y — àû'q^o + kot\e\ H h ^oi n e n 
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From this it follows that the numbers (Ào?o, Xa.\, , Xa n ) are also homogeneous 
coordinates of the point A. That is, homogeneous coordinates are defined only up to 
a common nonzero factor. Since by définition, A = (x) and x^O, they cannot ail be 
simultaneously equal to zéro. In order to emphasize that homogeneous coordinates 
are defined only up to a nonzero common factor, they are written in the form 

(«o : «i : «2 : • • • : ««)• (9.2) 

Thus if we wish to express some property of the point A in terms of its homogeneous 
coordinates, then that assertion must continue to hold if ail the homogeneous coor- 
dinates (ao, oi \ , . . . , a n ) are simultaneously multiplied by the same nonzero number. 

Let us assume, for example, that we are considering the points of projective space 
whose homogeneous coordinates satisfy the relationship 

F(a 0 ,a i, ...,a n ) =0, (9.3) 

where F is a polynomial in n + 1 variables. In order for this requirement actu- 
ally to be related to the points and not dépend on the factor À by which we can 
multiply their homogeneous coordinates, it is necessary that along with the num- 
bers («o, a?i, . . . , a n ), the relationship (9.3) be satisfied as well by the numbers 
(Àoro, A-ai , . . . , Xa n ) for an arbitrary nonzero factor À. 

Let us elucidate when this requirement is satisfied. To this end, in the polynomial 
F (xo, x \ , . . . , x n ) let us collect ail terms of the form ax^x^ • • • x„ n with ko -b k\ -b 
p k n = m and dénoté their sum by F m . We thereby obtain the représentation 


F(x 0 ,x i, ...,x n 


N 

) — ^ ^ F m (xo , x\ , . . . , x n ) . 

m= 0 


It follows at once from the définition of F m that 


F m (À.ÏQ, Xxi, Xx n ) = X'" F m (xo,X] Xn). 


From this, we obtain 


N 

F(Xx o, Àxi, . . . , Xx n ) = E ^ Fm Uo ■> X\-> • • • ? %n ) • 

m = 0 

Our condition means that the equality X!w=o — 0 i s satisfied for the coordi- 
nates of the points in question and simultaneously for ail nonzero values of À. Let 
us dénoté by c m the value F m («o, a \ , . . . , a n ) for some concrète choice of homoge- 
neous coordinates (ao, ot \ , . . . , a n ). Then we arrive at the condition Ylm=o c m^ m — 
0 for ail nonzero values À. This means that the polynomial c m X m in the vari- 

able À has an infinité number of roots (for simplicity, we are now assuming that the 
field K over which the vector space L is being considered is infinité; however, it 
would be possible to eliminate this restriction). Then, by a well-known theorem on 
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polynomials, ail the coefficients c m are equal to zéro. In other words, our equality 
(9.3) is reduced to the satisfaction of the relationship 

F m (a o,ai, . . . , a n ) = 0, ra = 0, 1, . . . , A. (9.4) 

The polynomial F m contains only monomials of the same degree m, that is, it is 
homogeneous. We see that the property of the point A expressed by an algebraic re- 
lationship between its homogeneous coordinates does not dépend on the permissible 
sélection of coordinates but only on the point A itself if it is expressed by setting the 
homogeneous polynomials in its coordinates equal to zéro. 

If L' c L is a vector subspace, then P (LO c P(L), silice every line (x) contained in 
L' is also contained in L. Such subsets P(L') C P(L) are called projective subspaces 
of the space P(L). Every P (LO is by définition itself a projective space. Its dimension 
is thus defined by dimP(L0 = dimL' — 1. By analogy with vector spaces, a projec- 
tive subspace P (LO C P(L) is called a hyperplane if dimP(L / ) = dimP(L) — 1, that 
is, if dim L' = dim L — 1, and consequently, L ' is a hyperplane in L. 

A set of points of the space P(L) defined by the relationships 

F\(a o,oq, . . . , oi n ) =0, 

Fi(a. o,«i, ...,«„) = 0, 


F m (^0 5 tX i , • • • , 01 n ) — 0, 

where F \ , F 2 , . . . , F m are homogeneous polynomials of differing (in general) de- 
grees, is called a projective algebraic variety. 

Example 9.2 The simplest example of a projective algebraic variety is a projec- 
tive subspace. Indeed, as we saw in Sect. 3.7, every vector subspace L' C L can 
be defined with the aid of a System of linear homogeneous équations, and conse- 
quently, a projective subspace P(L0 C P(L) can be defined by formula (9.5), in 
which m — dimP(L) — dimP(L0 and the degree of each of the homogeneous poly- 
nomials F\, , F m is equal to 1. Here in the case m — 1, we obtain a hyperplane. 

Example 9.3 Another important example of a projective algebraic variety is what 
are called projective quadrics. They are given by formula (9.5), where m — 1 and 
the degree of the sole homogeneous polynomial F\ is equal to 2. We shall consider 
quadrics in detail in Chap. 11. The simplest examples of projective quadrics appear 
in a course in analytic geometry, namely curves of degree 2 in the projective plane. 

Example 9.4 Let us consider the set of points of the projective space P(L) whose 
i th homogeneous coordinate (in some basis eo, e \, . . . ,e n of the space L) is equal to 
zéro, and let us dénoté by L, the set of vectors of the space L associated with these 
points. The subset L / C L is defined in L by a single linear équation or/ = 0, and 
therefore is a hyperplane. This means that P(L/) is a hyperplane in the projective 
space P(L). We shall dénoté the set of points of the projective space P(L) whose 
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Fig. 9.2 Affine subset of a 
projective space 



i th homogeneous coordinate is nonzero by V/ . It is obvious that V, is already not a 
projective subspace in P(L). 

The following construction is a natural generalization of Example 9.4. In the 
space L let an arbitrary basis eo,e \, . . . ,e n be chosen. Let us consider some linear 
function (p on the space L not identically equal to zéro. Vectors x e L for which 
(p(x) = 0 form a hyperplane C L. It is a subspace of the solutions of the “system” 
consisting of a single linear homogeneous équation. To it is associated the projec- 
tive hyperplane P(L^) c P(L). It is obvious that coincides with the hyperplane 
Lj from Example 9.4 if the linear function ç maps each vector x e L onto its i th 
coordinate, that is, tp is the i th vector of the basis of the space L*, the dual of the 
basis eo,e \, . . . , e n of the space L. 

Let us now dénoté by W \p the set of vectors x g L for which <p(x) — 1. This is 
again the set of solutions of the “system” consisting of a single linear équation, but 
now inhomogeneous. It can be viewed naturally as an affine space with space of 
vectors L^. Let us dénoté the set P(L) \ P(L^) by Vy. Then for every point A g 
there exists a unique vector x g for which A = (x). 

In this way, we may identify the set Vy with the set W^>, and with the aid of this 
identification, consider Vç an affine space. By définition, its space of vectors is L^, 
and if A and B are two points in V^, then there exist two vectors x and y for which 

(p{x) — 1 and (p{y) — 1 such that A — (x) and B — (y), and then AB — y — x. 
Thus the n-dimensional projective space P(L) can be represented as the union of 
the ft-dimensional affine space and the projective hyperplane P(L^) C P(L); see 
Fig. 9.2. In the sequel, we shall call an affine subset of the space P(L). 

Let us choose in the space L a basis eo, ... , e n such that cp(e o) = 1 and (p{ei) — 0 
for ail i = 1, . . . , n. Then the vector eo is associated with the point O = ( eo ) be- 
longing to the affine subset V^, while ail the remaining vectors e\, ... , e n are in 
L y, and they are associated with the points (^i), . . . , ( e n ) lying in the hyperplane 
P(L^). We hâve thus constructed in the affine space (V^, L^) a frame of reference 
(O; e \, . . . , e n ). The coordinates (£i, . . . , Ç„) of the point A e with respect to 
this frame of reference are called inhomogeneous coordinates of the point A in our 
projective space. We wish to emphasize that they are defined only for points in 
the affine subset V^. If we return to the définitions, then we see that the inhomo- 
geneous coordinates (£i, . . . , § n ) are obtained from the homogeneous coordinates 
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(9.2) through the formula 


Si = —, 

OL 0 



(9.6) 


It is obvious here that for x from formula (9.1), the function ip that we hâve chosen 
assumes the value (p(x) — œq. 

In order to extend the concept of inhomogeneous coordinates to ail points of 
a projective space P(L) = V<p U P(L^), it remains also to consider the points of 
the projective hyperplane P(L^). For such points it is natural to assign the value 
û?o = 0. Sometimes this is expressed by saying that the inhomogeneous coordinates 
(£i, . . . , £ n ) of the point A e P(L^) assume infinité values , which justifies thinking 
of P(L^) as a set of “points at infinity” (horizon) for the affine subset V<p. 

Of course, one could also choose a linear function p such that (pie/) — 1 for 
some number i g {0, . . . , n), not necessarily equal to 0 , as was done above, and 
<p(e f) — 0 for ail j / / . We will dénoté the associated spaces Vy and by V/ and 
L/. In this case, the projective space P(L) can be represented in the analogous form 
Vj U P(L/), that is, as the union of an affine part V/ and a hyperplane P(L Z ) for 
the corresponding value i g {0, . . . , n}. Sometimes this fact is expressed by saying 
that in the projective space P(L), one may introduce various affine charts. It is not 
difficult to see that every point A of a projective space P(L) is “finite” for some value 
i G {0, ... , n}, that is, it belongs to the subset Vj for the corresponding value i. This 
follows from the fact that by définition, homogeneous coordinates (9.2) of the point 
A are not simultaneously equal to zéro. If a; 7 ^ 0 for some i e {0, . . . , n}, then A is 
contained in the associated affine subset Vj . 

If L' and L" are two subspaces of a space L, then it is obvious that 


p(l / ) n p(l") = p(l' n l"). (9.7) 

It is somewhat more complicated to interpret the set P(L' + L"). It is obvious that 
it does not coincide with P(L') U P(L"). For example, if L ' and L" are two distinct 
fines in the plane L, then the set P(L') U P(L") consisting of two points is in general 
not a projective subspace of the space P(L). 

To give a géométrie interprétation to the sets P(L/ + L"), we shall introduce the 
following notion. Let P = (e) and P ' — ( e ') be two distinct points of the projec- 
tive space P(L). Let us set Li = (e, e') and consider the one-dimensional projective 
subspace P(Li). It obviously contains both points P and P ' , and moreover, it is 
contained in every projective subspace containing the points P and P ' . Indeed, if 
L/? C L is a vector subspace such that P(l_ 2 ) contains the points P and P', then this 
means that l _2 contains the vectors e and e', which implies that it also contains the 
entire subspace Li = (e, e'). Therefore, by the définition of a projective subspace, 
we hâve that P(U) c P(l_ 2 ). 


Définition 9.5 The one-dimensional projective subspace P(Li) constructed from 
two given points P P' is called the line connecting the points P and P ' . 
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Theorem 9.6 Let L ' and L" be two subspaces of a vector space L. Then the union 
of Unes connecting ail possible points of P(L/) with ail possible points of P(L") 
coincides with the projective subspace P(L/ + L"). 

Proof We shall dénoté by E the union of lines described in the statement of the 
theorem. Every such line has the form P(l_i), where l_i = {e\ e"), for vectors e' g L' 
and e" g L". Since e' + e n G L ; + L", it follows from the preceding discussion that 
every such line P(Li) belongs to P(L' 4- L"). Thus we hâve proved the set inclusion 
E CP(L' + L"). 

Conversely, suppose now that the point S G P(L) belongs to the projective sub- 
space P(L r + L "). This means that S — ( e ), where the vector e is in L' + L". And 
this implies that the vector e can be represented in the form e — e' + e" , where 
e ' g L f and e" g L". This means that S = ( e ) and the vector e belongs to the plane 
(e' , e "}, that is, S lies on the line connecting the point (e') in P(L') to the point (e") 
in P(L"). In other words, we hâve S e E, and thus the subspace P(L/ + L 7 ') is con- 
tained in E . Taking into account the reverse inclusion proved above, we obtain the 
required equality E = P(L' + L"). □ 

Définition 9.7 The set P(L' + L") is called a projective cover of the set P(L') U P(L // ) 
and is denoted by 

P(L' + L") = P(L') U P(L // ). (9.8) 

Recalling Theorem 3.41, we obtain the following resuit. 

Theorem 9.8 If P' and P" are two projective subspaces of a projective space P(L), 
then 

dim(P / H P") + dim(rUr 7 ) = dimP' + dimP". (9.9) 

Example 9.9 If P' and P" are two lines in the projective plane P(L), dim L = 3, then 
dimP' = dim P" = 1 and dim(P' U P") < 2, and from relationship (9.9), we obtain 
that dim(P' fl P") > 0, that is, every pair of lines in the projective plane intersect. 

The theory of projective spaces exhibits a beautiful symmetry, which goes under 
the name duality (we hâve already encountered an analogous phenomenon in the 
theory of vector spaces; see Sect. 3.7). 

Let L* be the dual space to L. The projective space P(L*) is called the dual of 
P(L). Every point of the dual space P(L*) is by définition a line (/), where / is 
a linear function on the space L not identically zéro. Such a function détermines a 
hyperplane Ly c L, given by the linear homogeneous équation f(x) — 0 in the vec- 
tor space L, which means that the hyperplane P/ is equal to P(L y) in the projective 
space P(L). 

Let us prove that the correspondence constructed above between points (/) of the 
dual space P(L*) and hyperplanes Py of the space P(L) is a bijection. To do so, we 
must prove that the équations f — 0 and af — 0 are équivalent, defining one and the 
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same hyperplane, that is, Py = P a f. As was shown in Sect. 3.7, every hyperplane 
L C L is determined by a single nonzero linear équation. Two different équations 
/ = 0 and f 0 can define one and the same hyperplane only if /j — a f , where 
a is some nonzero number. Indeed, in the contrary case, the System of the two 
équations / = 0 and f [ — 0 has rank 2, and therefore, it defines a subspace L" of 
dimension n — 2 in L and a subspace P(L") C P(L) of dimension n — 3, which is 
obviously not a hyperplane. Thus the dual space P(L*) can be interpreted as the 
space of hyperplanes in P(L). This is the simplest example of the fact that certain 
géométrie objects cannot be described by numbers (such as, for example, vector 
spaces can be described by their dimension), but constitute a set having a géométrie 
character. We shall encounter more complex examples in Chap. 10. 

There is also a much more general fact, namely that there is a bijection between 
ra-dimensional projective subspaces of the space P(L) (dimension n) and subspaces 
of dimension n — m — 1 of the space P(L*). We shall now describe this correspon- 
dence, and the reader will easily verify that for m — n — 1 , this coincides with the 
above-described correspondence between hyperplanes in P(L) and points in P(L*). 

Let L' C L be a subspace of dimension m + 1, so that dimP(L') = m. Let us con- 
sider in the dual space L*, the annihilator ( L') a of the subspace L'. Let us recall that 
the annihilator is the subspace (L/) fl C L* consisting of ail linear functions f e L* 
such that f(x) = 0 for ail vectors x g L'. As we established in Sect. 3.7 (formula 
(3.54)), the dimension of the annihilator is equal to 

dim(L/) fl = dimL — dimL = n — m. (9.10) 

The projective subspace P((L') fl ) C P(L*) is called the dual to the subspace 
P(L') C P(L). By (9.10), its dimension is n — m — 1. What we hâve here is a vari- 
ant of a concept that is well known to us. If a nonsingular symmetric bilinear form 
(je, y) is defined on the space L, then we can identify (L') fl with the orthogonal com- 
plément to L', which was denoted by (L/) - * - ; see p. 198. If we write the bilinear form 
(je, y) in some orthonormal basis of the space L, then it takes the form X^=o x î yi , 
and the point with coordinates (yo> yi, . . . , y n ) will correspond to the hyperplane 
defined by the équation 

n 

^*/y/ = 0, 
i = 0 

in which yo, . . . , y n are taken as fixed, and xo, ... ,x n are variables. 

The assertions we hâve proved together with the duality principle established in 
Sect. 3.7 leads automatically to the following resuit, called the principle of projective 
duality. 

Proposition 9.10 (Principle of projective duality) If a theorem is proved for ail 
projective spaces of a given fuite dimension n over a given field K in a formulation 
that uses only the concepts of projective subspace , dimension , projective cover, and 
intersection , then for ail such spaces , one has also the dual theorem obtained from 
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the original one by thefollowing substitutions : 


dimension m 
intersection Pi fl P2 
projective cover Pi U P2 


dimension n — m — 1 
projective cover Pi U P2 
intersection Pi D P2. 


For example, the assertion “through two distinct points of the projective plane 
there passes one line” has as its dual assertion “every pair of distinct Unes in the 
projective plane intersect in one point.” 

One may try to extend this principle in such a way that it will cover not only 
projective spaces, but also the projective algebraic varieties described by équation 
(9.5). However, in this regard there appear some new difficultés, which we shall 
only mention here without going into detail. 

Assume, for example, that a projective algebraic variety X c P(L) is given by 
the single équation 


F(vo,vi, . ..,x n ) = 0, 

where F is a homogeneous polynomial. To every point A e X there corresponds a 
hyperplane given by the équation 

n 

(A)*i= 0, (9.11) 

1=0 ÔXl 

called the tangent hyperplane to X at the point A (this notion will be discussed later 
in greater detail). By the above considérations, we can assign to this hyperplane the 
point B of the dual space P(L*). 

It is natural to suppose that as A runs through ail points X , then the point B also 
runs through some projective algebraic variety in the space P(L), called the dual 
to the original variety X. This is indeed the case, except for certain unpleasant ex- 
ceptions. Namely, for some point A, it could be the case that ail partial dérivatives 

q rp 

7^7 (A) are equal to 0 for i = 0, 1, . . . , n, and équation (9.1 1) takes the form of the 
identity 0 = 0. Such points are called singular points of the projective algebraic va- 
riety X. In this case, we do not obtain any hyperplane, and therefore, we cannot use 
the indicated method to assign to the point A a given point of the space P(L*). It 
is possible to prove that singular points are in some sense exceptional. Moreover, 
many very interesting varieties hâve no singular points at ail, so that for them, the 
dual variety exists. But then in the dual variety, there appear singular points, so that 
the beautiful symmetry nevertheless disappears. Overcoming ail these difficultés 
is the task of algebraic geometry. We shall not go deeply into this, and we hâve 
mentioned it only in connection to the fact that in Chap. 11, devoted to quadrics, 
we shall consider precisely the spécial case in which these difficultés do not ap- 
pear. 
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9.2 Projective Transformations 

Let A be a linear transformation of a vector space L into itself. It is natural to en- 
tertain the idea of extending it to the projective space P(L). It would seem to be 
something easy to do: one has only to associate with each point P e P(L) corre- 
sponding to the line (e) in L, the line ( e A(^)), which is some point of the projective 
space P(L). However, here we encounter the following difficulty : If A(e) = 0, then 
we cannot construct the line (A>(e)), since ail vectors proportional to A(e) are the 
null vector. Thus the transformation that we wish to construct is not defined in gen- 
eral for ail points of the projective space P(L). However, if we wished to define it for 
ail points, then we must require that the kernel of the transformation A be (0). As 
we know, this condition is équivalent to the transformation A : L -> L being nonsin- 
gular. Thus to ail nonsingular transformations A of the space L into itself (and only 
these) there correspond mappings of the projective space P(L) into itself. We shall 
dénoté them by P(*A). 

We hâve seen that a nonsingular transformation : L — > L defines a bijective 
mapping of the space L into itself. Let us prove that in this case, the corresponding 
mapping F (A) : P(L) -> P(L) is also a bijection. First, let us verify that its image 
coincides with ail P(L). Let P be a point of the space P(L). It corresponds to some 
line (e) in L. Since the transformation A is nonsingular, it follows that e = A(e f ) 
for some vector e' e L, and moreover, e' 0, since e 0. If P' is a point of the 
space P(L) corresponding to the line (e'}, then P r — F (A) (P). It remains to show 
that F (A) cannot map two distinct points into one. Let us suppose that P P' and 

F(A)(P) = F(A)(P') = P, (9.12) 

where the points P, P\ and P correspond to the lines (e), ( e ' ), and (?) respectively. 

The condition P ^ P' is équivalent to the vectors e and e' being linearly in- 
dependent, while from equality (9.12) it follows that (A>(£)) = (^(e')) — (<?), 
which means that the vectors A(e) and A(e f ) are linearly dépendent. But if 
aA(e) + /3A(e') = 0, where a ^ 0 or fi 0, then A(ae + fie') — 0, and since 
the transformation A is nonsingular, we hâve ae + fie' 7 ^ 0, which contradicts the 
condition P ^ P'. Thus we hâve proved that the mapping P («A) : P(L) P(L) is a 
bijection. Consequently, the inverse mapping P( e A) _1 is also defined. 

Définition 9.11 A mapping P(eA) of the projective space P(L) corresponding to the 
nonsingular transformation A of a vector space L into itself is called a projective 
transformation of the space P(L). 

Theorem 9.12 We hâve the following assertions : 

(1) P(e>4>i) = P(eA>2) if and only if A 2 — hA 1 , where X is some nonzero scalar. 

(2) If A 1 and A 2 are two nonsingular transformations of a vector space L, then 
P(eAl A 2 ) — P(eAl)P(e>4>2). 

(3) If A is a nonsingular transformation , then P(eA) -1 = P( e A _1 ). 
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(4) A projective transformation P(cA) car ries every projective subspace ofthe space 
P(L) into a subspace ofthe same dimension. 

Proof Ail the assertions of the proof follow directly from the définitions. 

(1) If ^2 = AcAi, then it is obvious that A \ and A 2 map fines of the vector space 
L in exactly the same way, that is, P(<Ai) = P(*>4>2). Now suppose, conversely, that 
P( e Ai)(A) = P(A> 2 )(A) for an arbitrary point A e P(L). If the point A corresponds 
to the line ( e ), then we hâve (A>i(£)) = (A 2 (e)), that is, 

A 2 (e) — XAi(e), (9.13) 

where X is some scalar. However, in theory, the number X in relationship (9.13) could 
hâve had its own value for each vector e. Let us consider two linearly independent 
vectors x and y and for the vectors x, y, and x + y, let us write down condition 

(9.13) : 

A 2 (x) = XA i(x), 

^>2(y) = (9.14) 

A 2 (x + y) = vA\(x + y). 

. 

In view of the linearity of A 1 and A 2 , we hâve 

A\(x + y) = A\(x) + eAi(j), A 2 (x + y) = A 2 (x) + A 2 (y)- (9.15) 

Having substituted expressions (9.15) into the third equality of (9.14), we then sub- 
tract from it the first and second inequalities. We then obtain 

(y — X)A \(x) + (y — p)A\(y) — A>i ((y — X)x + (y — p)y) — 0. 

Since the transformation A 1 is nonsingular (by the définition of a projective trans- 
formation), it follows that (y — X)x + (y — p)y — 0, and in view of the linear inde- 
pendence of the vectors x and y, it follows from this that X — v and p — v, that is, ail 
the scalars À, /x, y in (9.14) are the same, and therefore the scalar X in relationship 

(9.13) is one and the same for ail vectors e g L. 

(2) We must prove that for every point P of the corresponding line (e), we hâve 
the equality P^iA^HP) = P(<Ai)(P(< 4>2)(.P)), and this, by the définition of a pro- 
jective transformation, follows from the fact that ((«Ai^Xe)) = «Ai ((^> 2 ( 0 ))). The 
last equality follows from the définition of the product of linear transformations. 

(3) By what we hâve proven, we hâve the equality P( c A)P( e A _1 ) = P( e A e A _1 ) = 
P(£). It is obvious that P(£) is the identity transformation of the space P(L) into 
itself. From this, it follows that P(A.) -1 = P( e A _1 ). 

(4) Finally, let L ' be an m-dimensional subspace of the vector space L and let 
P(L') be the associated (m — l)-dimensional projective subspace. The mapping 
F (A) takes P(L') into a collection of points of the form P ff — ( A(e ')), where 
P ' — üe')) runs through ail points of P(L'). This holds because e' runs through 
ail vectors of the space L'. Let us prove that here, ail vectors (^(e')) coincide with 
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the nonnull vectors of some vector subspace L " having the same dimension as L'. 
This will give us the required assertion. 

In the subspace L', let us choose a basis e \ , . . . , e m . Then every vector e' e L' can 
be represented in the form 


e' — a i e i + • • • + oi m e m , 

while the condition e' ^ 0 is équivalent to not ail the coefficients ol x being equal to 
zéro. From this, we obtain 

A(e ') = ot\A(e\) H a m A(e m )- (9.16) 

The vectors A(e i), . . . , A(e m ) are linearly independent, since the transformation 
A : L -> L is nonsingular. Let us consider the m-dimensional subspace L" = 
(cA(^i), . . . , A(e m )). From the relationship (9.16), it follows that the transformation 
P(eA) takes the points of the subspace P(L') precisely into the points of the subspace 
P(L // ). From the equality dimL/ = dimL" = m, we obtain dimP(L') = dimP(L // ) = 
m — 1. □ 

By analogy with linear and affine transformations, there is a hope that we can de- 
scribe a projective transformation unambiguously by how it maps a certain number 
of “sufficiently independent” points. As a first attempt, we may consider the points 
Pi = (et) for i = 0, 1, . . . , n, where eo,e\, . . . , e n is a basis of the space L. But this 
path does not lead to our goal, for there exist too many distinct transformations tak- 
ing each point Pi into itself. Indeed, such are ail the transformations of the form 
P(*>4>) if A(ei) = with arbitrary À/ ^ 0, that is, in other words, if A has, in the 
basis eo,e\ , . . . , e n , the matrix 



0 

••• 0\ 

0 

À1 

... 0 

U 

0 

• • • x n j 


In this case, (^(e, )) = (e, ) for ail i = 0, 1, . . . , n. However, the image of an arbi- 
trary vector 

é? = û'oeo + a\e\ H I -a n e n 

is equal to 


A(e) = +ûfiÀicA(^i) H | -a n k n A(e n ), 

and this vector is already not proportional to e unless ail À, are identical. Thus even 
knowing how the transformation P( C A) maps the points Po, P \, . . . , P n , we are not 
y et able to détermine it uniquely. But it turns out that the addition of one more point 
(under some weak assumptions) describes the transformation uniquely. For this, we 
need to introduce a new concept. 
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Définition 9.13 In the n-dimensional projective space P(L), n + 2 points 

PQ,Pl,...,Pn, Pn+1 (9.17) 

are said to be independent if no n + 1 of them lie in a subspace of dimension less 
than n . 

For example, four points in the projective plane are independent if no three of 
them are collinear. 

Let us explore what the condition of independence means if to the point P t 
there corresponds the line {efi, i = 0 + 1. Since by définition, the points 
Po, . . . , P n do not lie in a subspace of dimension less than n, it follows that the 
vectors eo, e \, . . . , e n do not lie in a subspace of dimension less than n + 1, that 
is, they are linearly independent, and this means that they constitute a basis of the 
space L. Thus the vector e n +\ is a linear combination of these vectors: 


e n +i = o'o^o + H \-a n e n . (9.18) 

If some scalar a; is equal to 0, then from (9.18), it follows that the vector e n +\ 
lies in the subspace L' = {eo, . . . ,$i, . . . ,e n ), where the sign w indicates the omis- 
sion of the corresponding vector. Consequently, the vectors eo e /7 , e n +i 

lie in a subspace L ' whose dimension does not exceed n. But this means that the 
points Po , ... , Pi , . . . , P n , Pn + 1 lie i n the projective space P(L'), and moreover, 
dimP(L') < n — 1, that is, they are dépendent. 

Let us show that for the independence of points (9.17), it suffices that in the 
décomposition (9.18), ail coefficients a?/ be nonzero. Let the vectors eo,e\, • . . , e n 
form a basis of the space L, while the vector e n +\ is a linear combination (9.18) 
of them such that ail the a/ are nonzero. Let us show that then, the points (9.17) 
are independent. If this were not the case, then some n + 1 vectors from among 
eo, e \, . . . , e n +\ of the space L would lie in a subspace of dimension not greater 
than n. This cannot be the vectors eo, e \ , . . . , e n , since by assumption, they consti- 
tute a basis of L. So let it be the vectors eo e n ,e n +\ for some i < n + 1, 

and their linear dependence is expressed by the equality 


H b A./_i£/_i + À/ + i£/ + i H h à, 7+ i£ /î+ i — 0, 


where \ n +\ 7 ^ 0, since the vectors eo, e\, . . . , e n are linearly independent. From 
this, it follows that the vector e n +\ is a linear combination of the vectors 
eo , . . . , èi , . . . , e n . But this contradicts the condition that in the expression (9.18), 
ail the ai are nonzero, since the vectors eo, e \ , . . . , e n form a basis of the space L, 
and the décomposition (9.18) for an arbitrary vector e n +\ uniquely détermines its 
coordinates o '/ . 

Thus, n + 2 independent points (9.17) are always obtained from n + 1 points 
Pi = (ej) whose corresponding vectors e { form a basis of the space L by the addition 
of one more point P — (e) for which the vector e is a linear combination of the 
vectors et with ail nonzero coefficients. 

We can now formulate our main resuit. 
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Theorem 9.14 Let 


POi P\i • • • > Piu Pn + 1 î P 0 ,P l9 ... 9 P n , P f 7 + 1 


(9.19) 


be two Systems of independent points ofthe projective space P(L) of dimension n. 
Then there exists a projective transformation taking the point Pj to P' for ail i — 
0, 1, . . . , n + 1, and moreover, it is unique. 


Proof We shall use the interprétation of the property of independence of points 
obtained above. Let points P, correspond to the lines (ei), and let the points P- cor- 
respond to the lines {e'f). We may assume that the vectors eo , . . . , e n and the vectors 
£q, . . . , e' n are bases of an ( n + l)-dimensional subspace of L. Then as we know, for 
every collection of nonzero scalars Ào, . . . , X n , there exists (and it is unique) a non- 
singular linear transformation A : L L mapping e{ to 'k[e , l for ail i =0, 1, . . . , n. 

By définition, for such a transformation A, we hâve P (A) (Pi) = P[ for ail i = 
0, 1, . . . , n. Since dim L = n + 1, we hâve the relationships 

^h +1 = û'o^o + û'i^i + • • • + oi n e n , e ' n _\_ i = û'q^q + ct\ e\ H + a n e n- (9.20) 

From the condition of independence of both collections of points (9.19), it follows 
that in the représentations (9.20), ail the coefficients a; and aj are nonzero. Applying 
the transformation A to both sides of the first relationship in (9.20), taking into 
account the equalities A(e,j = X/e'j , we obtain 

'A>( e n+ 1 ) — ^o^-o^o + ot\X\e\ + • • • + oi n X n e ' n . (9.21) 

After setting the scalars À/ equal to c^orr 1 for ail i = 0, 1, . . . , n and substituting 
them into the relationship (9.21), taking into account the second equality of formula 
(9.20), we obtain that A(e n + 1 ) = e r n+v that is, F(A)(P n +i) = P ; ' +1 . 

The uniqueness of the projective transformation P («A) that we hâve obtained fol- 
lows from its construction. □ 


For example, for n = 1, the space P(L) is the projective line. Three points 
Pq, P[, P 2 are independent if and only if they are distinct. We see that any three 
distinct points 011 the projective line can be mapped into three other distinct points 
by a unique projective transformation. 

Let us now consider how a projective transformation can be given in coordinate 
form. In homogeneous coordinates (9.2), the stipulation of a projective transforma- 
tion F (A) in fact coincides with that of a nonsingular linear transformation A, and 
indeed, the homogeneous coordinates of a point A e P(L) coincide with the coor- 
dinates of the vector x from (9.1) that détermines the line (x) corresponding to the 
point A. Using formula (3.25), we obtain for the homogeneous coordinates fj of 
the point P(A)(A) the following expressions in homogeneous coordinates a, of the 
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point A: 


Po — <200^0 + a 0\ a l + «02^2 H 1- a 0 n a n, 

Pi =aiocto + ana\ +<312^2 H \-ai n a n , 


(9.22) 


Pn — a n oao A- ci n \ot\ + a, 72^2 + • • • + ü nn a n . 

Here we must recall that the homogeneous coordinates are defined only up to a 
common factor, and both collections (ao : ct\ : • • • : a n ) and (po : P\ : • • • : p n ) are 
not identically zéro. Clearly, in multiplying ail the otj by the common factor À, ail Pi 
in formula (9.22) are also multiplied by this factor. Ail the pi cannot become zéro if 
ail the ai cannot become zéro (this follows from the fact that the transformation A is 
nonsingular). The condition of nonsingularity of the transformation A is expressed 
as the déterminant of its matrix being nonzero: 


tfoo tfoi • • • aon 

a 10 tfll ••• Clin 

• • • • 

Cl ni) Cl n i • • • Clfjn 



Another way of writing a projective transformation is in inhomogeneous coor- 
dinates of affine spaces. Let us recall that a projective space P(L) contains affine 
subsets Vi, i = 0, 1, . . . , n, and it can be obtained from any of the V/ by the addition 
of the corresponding projective hyperplane P(L/) consisting of “points at infmity,” 
that is, in the form P(L) = V/ U P(L/). For simplicity of notation, we shall limit 
ourselves to the case i — 0; ail the remaining V/ are considered analogously. 

To an affine subset Vo there corresponds (as its subspace of vectors) the vector 
subspace Lo C L defined by the condition ao = 0. For assigning coordinates in the 
affine space Vo, we must fix in the space some frame of reference consisting of a 
point O G Vo and a basis in the space Lo. In the ( n + l)-dimensional space L, let us 
choose a basis eo, ei , . . . , e n . For the point O G Vo, let us choose the point associated 
with the line (^o), and for the basis in Lo, let us take the vectors e\, ...,e n . 

Let us consider a point A g Vo, which in the basis eo, ei , . . . , e n of the space L 
has homogeneous coordinates (ao : a i : • • • : a n ), and repeating the arguments that 
we used in deriving formulas (9.6), let us find its coordinates with respect to the 
frame of reference (0\ e i, ... ,e n ) constructed in the manner outlined above. The 
point A corresponds to the line (e), where 

e = c^o^o H - û'i^i H b 0L n e n , (9.23) 

and moreover, ao 7 ^ 0, since A e Vo. By assumption, we must choose from both 
lines ( eo ) and (e), vectors x and y with coordinate cyo = 1 and examine the coor- 
dinates of the vector y — x with respect to the basis e \, . . . , e n . It is obvious that 
x — eo, and in view of (9.23), we hâve 

y = eo + û'iû' 0 “ 1 ^i H \-a n ctQ l e n . 
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Thus the vector y — x has, in the basis e \ , . . . , e n , coordinates 

a i et n 

x\ — , • • • , x n — 

û'O et 0 

We shall now consider a nonsingular linear transformation A : L -> L and the 
associated projective transformation P(<A), given by formulas (9.22). It takes a point 
A with homogeneous coordinates oti to a point B with homogeneous coordinates pi . 
In order to obtain in both cases inhomogeneous coordinates in the subset Vo> il is 
necessary, by formula (9.6), to divide ail the coordinates by the coordinate with 
index 0. Thus we obtain that a point with inhomogeneous coordinates x, = ^ is 

mapped to the point with inhomogeneous coordinates y/ = that is, taking into 
account (9.22), we obtain the expressions 


<3/0 + <3/1*1 + • • • + O-inXn . 

yi = , I = 

<300 + < 301*1 + • • • + < 30 / 7*/7 


(9.24) 


In other words, in inhomogeneous coordinates, a projective transformation can be 
written in terms of the linear fractional formulas (9.24) with a common denominator 
for ail yi . It is not defined at points where this denominator becomes zéro, and these 
are the “points at infinity,” that is, points of the projective hyperplane P(Lo) with 
équation Pq = 0. 

Let us consider projective transformations mapping “points at infmity” to “points 
at infinity” and consequently, “finite points” to “finite points.” This means that the 
equality Pq = 0 is possible only for û'O = 0 , that is, taking into account formula 
(9.22), the equality 


Û'OOÛ'O + 001 ai + <302^2 H h ao n ot n = 0 

is possible only for ao = 0. Obviously, this latter condition is équivalent to the con- 
ditions aoi = 0 for ail i = 1, . . . , n. In this case, the common denominator of the 
linear fractional formulas (9.24) reduces to the constant aoo- From the nonsingular- 
ity of the transformation A, it follows that æoo 7 ^ 0, and we can divide the numer- 
ators in equalities (9.24) by aoo- We then obtain precisely the formulas for affine 
transformations (8.17). Thus affine transformations are spécial cases of projective 
transformations, namely, those that take the set of “points at infinity” to itself. 

Example 9.15 In the case dimP(L) = 1, the projective line P(L) has a single inho- 
mogeneous coordinate, and formula (9.24) assumes the form 

a + bx 

V = — , ad — bc ^ 0. 

c + dx 

Transformations of the “finite part” of the projective line (x 7 ^ 00 ) are affine and 
hâve the form y = et + /3x, where P 7 ^ 0. 
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9.3 The Cross Ratio 

Let us recall that in Sect. 8 . 2 , we defined the affine ratio (A, B , C) among three 
collinear points of an affine space, and then, in Sect. 8 . 3 , it was proved (The- 
orem 8 . 28 ) that the affine ratio (A, B , C) among three collinear points does not 
change under a nonsingular affine transformation. In projective spaces, the notion 
of a relationship among three collinear points cannot be given a natural analogue. 
This is the resuit of the following assertion. 

Theorem 9.16 Let Ai , B[ , C\ and A 2, B2 , C2 be two triples of points in a projective 
space satisfying the following conditions : 

(a) The three points in each triple are distinct. 

(b) The points in each triple are collinear ( one line for each triple). 

Then there exists a projective transformation taking one triple into the other. 

P roof Let us dénoté the line on which the three points A/ , Bi , C/ lie by // , where 
i — 1 , 2 . Points A\,B\,C\ are independent on / 1, and the points A2, B2, C 2 are in- 
dependent on I2. Let the point A,* be determined by the line (£/), point Bj by the 
line {fi), point C/ by the line {gj), and line f by the two-dimensional space L /, 
i — 1 , 2 . They are ail contained in the space L that détermines our projective space. 
Repeating the proof of Theorem 9.14 Verbatim, we shall construct an isomorphism 
X : Li l_2 taking the lines (/j), {g J to the lines (e 2 ), (/ 2 >, (g 2 > respec- 
tively. Let us represent the space L in the form of two décompositions: 

L = Lj ® Lj, L = L 2 ® 14. 

It is obvious that dim L\ — dim L' 2 = dim L — 2 , and therefore, the spaces L\ and 
L, are isomorphic. We shall choose some isomorphism A" : L' { -> L' 2 and define a 
transformation A : L —> L as A' on Li and as A" on Lj, while for arbitrary vectors 
x g L, we shall use the décomposition x = x\ + x\, x\ e 4 , x\ e L\, to define 
e> 4 >(jc) = A'(*i) + A"(x j). It is easy to see that A is a nonsingular linear transfor- 
mation, and the projective transformation P (A) takes the triple of points Ai , B\ , C\ 
to A2, B2, C2. □ 

Analogously to the fact that for a triple of collinear points A, B, C of an affine 
space, there is an associated number (A, B, C) that is unchanged under every non- 
singular affine transformation, in a projective space we can associate with a quadru- 
ple of collinear points Ai , A2, A3, A4 a number that does not change under projec- 
tive transformations. This number is denoted by (Ai, A2, A3, A4) and is called the 
cross or anharmonic ratio of these four points. We now turn to its définition. 

Let us consider first the projective line / = P(L), where dim L = 2 . Four arbitrary 
points Ai, A2, A3, A4 on / correspond to four lines {a 1), («2)» ( a 3)» («4) lying in the 
plane L. In the plane L, let us choose a basis e \ , £2 and consider the décomposition 
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of the vectors a, in this basis: fl/ = X[e\ + y^, i — 1, . . . , 4. The coordinates of the 
vectors a \ , . . . , «4 can be written as the columns of the matrix 



X2 *3 m\ 

yi yi y*) 


Consider the following question: how do the minors of order 2 of the matrix M 
change under a transition to another basis e ' { , e' 2 of the plane L? Let us dénoté by 
[«/] and [a';] the columns of the coordinates of the vector a, in the bases (e\, £ 2 ) 
and (e' { , e' 2 ) respectively: 




By formula (3.36) for changing coordinates, they are related by [a] = C[ct'], 
where C is the transition matrix from the basis e' x , e r 0 to the basis e\, e 2 - From this 
it follows that 



for any choice of indices i and j , and by the theorem on multiplication of détermi- 
nants, we obtain 







Xi 

x j 

= |C|- 

x i 

y'i 

x j 

y j 

yi 

yj 


where |C| 7 ^ 0. This means that for any three indices /, j, k, the relation 


Xi Xj 

yi yj 


x t x i 

ty) 

Xi x k 


x[ x' k 

yi yk 


! / 



yi y\ 


(9.25) 


is unaltered under a change of basis (we assume now that both déterminants, in 
the numerator and denominator, are nonzero). Thus relationship (9.25) détermines a 
number («/, aj , a^) depending on the three vectors , cij, a & but not on the choice 
of basis in L. 

However, this is not y et what we promised: the points A, indeed détermine the 
Unes (a/), but not the vectors «/. We know that the vector a' { détermines the same 
line as the vector fl/ if and only if fl- = À/fl/, À/ 7 ^ 0. Therefore, if in expression 
(9.25) we replace the coordinates of the vectors fl/, a ; , a k with the coordinates of 
the proportional vectors a'j,a'j,a' k , then its numerator will be multiplied by À/ À y, 
while its denominator will be multiplied by À/À^, with the resuit that the entire 
expression (9.25) will be multiplied by the number ÀyÀ^ 1 , which means that it will 
change. 
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However, if we now consider the expression 


DV(Ai,A 2 ,A 3 ,A 4 ) = 


X2 M 

yi y a 


X\ A 3 

>’l V3 


Al A 4 

y\ 3 ; 4 


A 2 A3 » 
>’2 V3 


(9.26) 


then as our previous reasoning demonstrates, it will dépend neither on the choice 
of basis of the plane L nor on the choice of vectors «/ on the lines (ai), but will 
be determined only by the four points Ai, A 2 , A3, A4 on the projective line /. It is 
expression (9.26) that is called the cross ratio of these four points. 

Let us write the expression for DV(Ai , A 2 , A3, A 4 ) assuming that homogeneous 
coordinates hâve been introduced on the projective line /. Let us begin with the 
formula written in the homogeneous coordinates (x : y). We shall now consider the 
points A/ “finite” points of /, that is, we assume that y/ / 0 for ail i = 1, . . . , 4, and 
we set ti = Xj /yi ; these will be the coordinates of the point A, in the “affine part” 
of the projective line /. Then we obtain 


Xi 

yi 


x j 

yj 


U 



1 


= yiyj(ti-tj). 


Substituting these expressions into formula (9.26), we see that ail the y/ cancel, and 
as a resuit, we obtain the expression 


DV(A 1 ,A 2 ,A 3 ,A 4 ) = 


(fi — ^ 3 ) te — 4) 
(fl — f4)(f2 — f3) 


(9.27) 


If we assume that ail four points Ai, A 2 , A3, A 4 lie in the “finite part” of the 
plane, then this means in particular that they belong to the affine part of the projec- 
tive line / and hâve finite coordinates f 1 , f 2 , f3 , f 4 on the projective line /. Taking into 
account formula (8.8) for the affine ratio of three points, we observe that then the 
expression for the cross ratio takes the form 


DV(Ai, A 2 , A 3 , A 4 ) = 


(A3 , A 2 , Ai) 
(A 4 , A 2 , Ai) 


(9.28) 


Equality (9.28) shows the connection between the cross ratio and the affine ratio 
introduced in Sect. 8.2. 

We hâve determined the cross ratio for four distinct points. In the case in which 
two of these points coincide, it is possible to define this ratio under some natural 
conventions (as we did for the affine ratio), setting the cross ratio in some cases 
equal to 00. However, the cross ratio remains undefined if three of the four points 
coincide. 

The above reasoning almost contains the proof of the following fundamental 
property of the cross ratio. 


Theorem 9.17 The cross ratio of four collinear points in a projective space does 
not change under a projective transformation ofthe space. 
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Fig. 9.3 Perspective 
mapping 



P roof Let Ai , A2, A3, A4 be four points lying on the line V in some projective space 
P(L). They correspond to the four Unes (ai), (i 12 ), («3), («4) of the space L, and the 
line V corresponds to the two-dimensional subspace L' C L. Let A be a nonsingular 
transformation of the space L, and cp — P (A) the corresponding projective trans- 
formation of the space P(L). Then by Theorem 9 . 12 , <p(l r ) — I " is another line in 
the projective space P(L); it corresponds to the subspace A(L') C L and contains 
the four points (p(A\), <p(A2), cp(Af). Let the vectors e\,e2 form a basis of 

L and write the vectors a { as a/ — x x e 1 + y ie 2, i = 1 , . . . , 4 . Then the cross ratio 
DV(Ai, A2, A3, A4) is defined by the formula ( 9 . 26 ). 

On the other hand, A (a,) — XjA(e 1) + yiA(e 2), and if we use the bases / 1 = 
A(e 1) and f 2 = A(e 2) of the subspace A(L'), then the cross ratio 

DV(ç>(Ai), çs(A 2 ), cp(A 3 ), ç 3 (A 4 )) 

is defined by the same formula ( 9 . 26 ), since the coordinates of the vectors A (a/) in 
the basis f f 2 coincide with the coordinates of the vectors a, in the basis e\, ^2- 
But as we hâve already verified, the cross ratio dépends neither on the choice of 
basis nor 011 the choice of vectors a z that détermine the fines (aï). Therefore, it 
follows that 


DV(Ai, A 2 , A 3 , A4) = DV(ç9( AO, <p(A 2 ), <p(A 3 ), <p(A 4 )). □ 

Example 9.18 In a projective space 77 , let us consider two fines l\ and I2 and a point 
O lying on neither of the fines. Let us connect an arbitrary point A e l \ to the point 
O of the fine Ia ; see Fig. 9 . 3 . We shall dénoté the point of intersection of the fines 
Ia and I2 by A\ The mapping of the fine l\ into I2 that to each point A e l\ assigns 
the point A! e I2 is called a perspective mapping. 

Let us prove that there exists a projective transformation of the plane 77 defining 
a perspective correspondence between the fines l\ and I2 . To this end, let us dénoté 
by /o the fine joining the point O and the point P — l\ D I2, and let us consider 
the set V = Il \ /o. In other words, we shall consider /o a “fine at infinity” and the 
points of V will be considered “finite points” of the projective plane. Then on F, the 
perspective correspondence will be given by a bundle of parallel fines, since these 
fines in the “finite part” do not intersect; see Fig. 9 . 4 . 

More precisely, this bundle defines a mapping of the “finite parts” l\ and of 
the fines l\ and I2 . From this it follows that in the affine plane F, the fines l\ and 
l ' 2 are parallel, and the perspective correspondence between them is defined by an 
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Fig. 9.4 A bundle of parallel 
Unes 



arbitrary translation T a by the vector a = AA', where A is an arbitrary point on the 
line l\ , and A' is the point on the line l' 2 corresponding to it under the perspective 
correspondence. As we saw above, every nonsingular affine transformation of an 
affine plane V is a projective mapping for 77, and this is even more obviously the 
case for a translation. This means that a perspective correspondence is defined by 
some projective transformation of the plane 77. Therefore, from Theorem 9.17, we 
deduce the following resuit. 

Theorem 9.19 The cross ratio of four collinear points is preserved under a per- 
spective correspondence. 


9.4 Topological Properties of Projective Spaces* 

The previous discussion in this chapter was related to a projective space P(L), where 
L was a finite-dimensional vector space over an arbitrary field K. If our interest is 
in a particular field (for example, M or C), then ail the assertions we hâve proved 
remain valid, since we used only general algebraic notions (which dérivé from the 
définition of a field), and nowhere did we use, for example, properties of inequality 
or absolute value. Now let us say a few words about properties related to the notion 
of convergence , or as they are called, topological properties, of projective spaces. It 
makes sense to talk about them if, for example, L is a real or complex vector space, 
that is, the field in question is K = M or C. 

Let us begin by formulating the notion of convergence of a sequence of vectors 
je i, je 2 , • • . , Xk, • • • in a space L to a vector x of the same space. Let us choose in L 
an arbitrary basis eo, e \ , . . . , e n and let us write the vectors jc& and x in this basis: 

Xk = û^o^o + &k\e\ H + otk n e n , x — fio^o + f\e\ + Y f n e n . 

We shall say that the sequence of vectors jci , JC2, . . . , jc*, ... converges to the vector 
x if the sequence of numbers 


a\ ,-,a 2 i, ... (9.29) 

for fixed i converges to the number fi as k —> oo for each index i = 0, 1, . . . , n (in 
speaking about complex vector spaces, we assume that the reader is familiar with the 
notion of convergence of a sequence of complex numbers). The vector x is called, 
in this case, the limit of the sequence. From the formulas for changing coordinates 
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given in Sect. 3.4, it is easy to dérivé that the property of convergence does not 
dépend on the basis in L. We shall write this convergence as Xk —> x as k -> oo. 

Let us move now from vectors to points of a projective space. In both cases 
that we are considering (K = R or C), there is a useful method of normalizing the 
homogeneous coordinates (vo : x\ : • • • : x n ) defined, generally speaking, only up to 
multiplication by a common factor À ^ 0. Since by définition, the equality v, = 0 
for ail i = 0, 1, . . . , n is impossible, we may choose a coordinate x r for which \x r \ 
(the absolute value in M or C, respectively) assumes the greatest value, and setting 
À = \x r \, make the substitution y/ = À -1 *, for ail i =0, 1, . . . , n. Then, obviously, 

(*o : x\ : • • • : x n ) = (yo : yi : • • • : y n ), 
and moreover, | y r \ = 1 and | y,- 1 < 1 for ail i = 0, 1, . . . , n. 

Définition 9.20 A sequence of points Pi , P 2 , . . . , Pk , . . . converges to the point P 
if on every line {ef) that détermines the point P^, and on the line (e) determining 
the point P, it is possible to find nonnull vectors and x such that Xk —> x as 
k 00 . This is written as Pk — > P as k 00 . The point P is called the limit of the 
sequence Pi, P 2 , . . . , Pk, — 

We note that by assumption, {ejc) = { Xk ) and (e) — (x). 

Theorem 9.21 It is possible to choose from an arbitrary infinité sequence of points 
of a projective space a subsequence that converges to a point of the space. 

P roof As we hâve seen, every point P of a projective space can be represented in the 
form P = (y), where the vector y has coordinates (yo, yi, . . . , y n ), and moreover, 
max | y/ 1 = 1 . 

It is proved in a course in real analysis that every bounded sequence of real num- 
bers satisfies the assertion of Theorem 9.21. It is also very easy to prove the state- 
ment for a sequence of complex numbers. To obtain from this the assertion of the 
theorem, let us consider an infinité sequence of points Pi, P 2 , . . . , Pk, . . . of the 
projective space P(L). Let us focus attention first 011 the sequence of zeroth (that 
is, having index 0) coordinates of the vectors x\,X 2 , . . . , Xk , . . . corresponding to 
these points. Suppose they are the numbers 

<*10, ^20, • • • ? &k 0 , — (9.30) 

As we noted above, we may assume that ail |a*o| are less than or equal to 1. By the 
assertion from real analysis formulated above, from the sequence (9.30), we may 
choose a subsequence 

&n\0i ^«20’ • • • ’ • • • ’ (9-31) 

converging to some number /3q that therefore also does not exceed 1 in absolute 
value. Let us now consider a subsequence of points P , n , P„ 2 , . . . , P nk , . . . and of 
vectors x fl[ , x n2 , . . . , x n/c , . . . with the same indices as those in the subsequence 
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(9.31). Let us focus attention on the first coordinate of these vectors. For them, 
clearly, it is also the case that \oc nk \ \ < 1. This means that from the sequence 

1 1 » Ot n 2 1 » • • • » Otfifc 1 » • • • 

we may choose a subsequence converging to some number P \ , and moreover, clearly 

Repeating this argument n + 1 times, we obtain as a resuit, from the original 
sequence of vectors x\, X 2 , . . . , Xk , . . . , a subsequence x mi , x m2 , . . . , x mk , . . . con- 
verging to some vector xgL, which, like every vector of this space, can be decom- 
posed in terms of the basis eo,e\, ... , e n , that is, 

x — poeo -\- P\e\ -\ + p n e n - 

This gives us the assertion of Theorem 9.21 if we ascertain that not ail coordinates 
po, /3\, . . . , p n of the vector x are equal to zéro. But this follows from the fact that 
by construction, for each vector x mk of the subsequence x m] , x mi , . . . , x mk , 
a certain coordinate ct mk i, i — 0, . . . , n, has absolute value equal to 1. Since there 
exists only a finite number of coordinates, and the number of vectors x mk is in- 
finité, there must be an index i such that among the coordinates a mk i, infinitely 
many will hâve absolute value 1 . On the other hand, by construction, the sequence 
a m i i , a m2 i , . . . , a mk i , . . . converges to the number pi, which therefore must hâve 
absolute value equal to 1 . □ 

The property established in Theorem 9.21 is called compactness. It holds as well 
for every projective algebraic variety of a projective space (whether real or com- 
plex). We may formulate it as follows. 

Corollary 9.22 In the case of a real or complex space , the points of a projective 
algebraic variety form a compact set. 

Proof Let the projective algebraic variety X be given by a System of équations (9.5), 
and let Pi , P 2 , . . . , Pk, • • • be a sequence of points in X . By Theorem 9.21, there ex- 
ists a subsequence of this sequence that converges to some point P of this space. It 
remains to prove that the point P belongs to the variety X. For this, it suffices to 
show that it can be represented in the form P = ( u ), where the coordinates of the 
vector u satisfy équations (9.5). But this follows at once from the fact that polyno- 
mials are continuous functions. Let F (xq, x\ , . . . , x n ) be a polynomial (in this case, 
homogeneous; it is one of the polynomials F, appearing in the System of équations 
(9.5)). We shall write it in the form F — F(x), where x e L. Then from the conver- 
gence of the vectors xu —> x as k — ► 00 such that F(xk) = 0 for ail k, it follows that 
F(x) = 0. □ 

For subsets of a finite-dimensional vector or affine space (whether real or com- 
plex), the property of compactness is related to their boundedness — more precisely, 
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Fig. 9.5 The real projective 
line 



the property of boundedness follows from compactness. Thus while real and com- 
plex vector or affine spaces can be visualized as “extending unboundedly in ail di- 
rections,” for projective spaces, such is not the case. But what does it mean to say 
“can be visualized”? In order to formulate this intuitive idea precisely, we shall in- 
troduce for the real and complex projective fines some simple géométrie représenta- 
tions to which they are homeomorphic (see the relevant définition on p. xviii). This 
will allow us to give a précisé meaning to the words that a given set “can be visual- 
ized.” Let us observe that the property of compactness established in Theorem 9.21 
is unchanged under a transition from one set to another that is homeomorphic to it. 

Let us begin with the simplest situation: a one-dimensional real projective space, 
that is, the real projective line. It consists of pairs (xo : xi), where xo and x\ are 
considered only up to a common factor À 7^ 0 . Those pairs for which xo 7^ 0 form 
an affine subset £/, whose points are given by the single coordinate t = xi/xo, so 
that we may identify the set U with R. Pairs for which xo = 0 do not enter the set 
U , but they correspond to only one point (0 : 1) of the projective fine, which we 
shall dénoté by (00). Thus the real projective fine can be represented in the form 
R U (00). 

The convergence of points Pk —> Q as k -> 00 is defined in this case as follows. 
If points Pk 7^ (00) correspond to the numbers tk, and the point Q 7^ (00) corre- 
sponds to the number t, then Pk = {&k • Pk) and Q — (a : P), where Pk/oik — tk , 
cik 7^ 0 , and p/ot = t, a ^ 0 . The convergence Pk Q as k 00 in this case im- 
plies the convergence of the sequence of numbers tk — > t as k 00. In the case 
that Pk -> (00), the convergence (in the previous notation) means that ak -> 0, 
Pk —> 1 as k -> 00, from which it follows that tjj { —> 0, or equivalently, \tk\ 00 
as k 00. 

We can graphically represent the real projective fine by drawing a circle tangent 
to the horizontal fine / at the point 0 \ see Fig. 9 . 5 . Connecting the highest point O' 
of this circle with an arbitrary point A of the circle, we obtain a fine that intersects / 
at some point B. We thereby obtain a bijection between points A 7^ O' of the circle 
and ail the points B of the fine /. If we place the coordinate origin of the fine / at the 
point O and associate with each point Bêla, number te R resulting from a choice 
of some unit measure on the fine / (that is, an arbitrary point of the fine / different 
from O is given the value 1), then we obtain a bijection between numbers t e M 
and points A 7^ O' of the circle. Then |fy| — >• 00 if and only if for the corresponding 
points Ak of the circle, we hâve the convergence Ak — >• O' . Consequently, we obtain 
a bijection between points of the real projective fine R U (00) and ail points of the 
circle that préserves the notion of convergence. Thus we hâve proved that the real 
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Fig. 9.6 Stereographic 
projection of the sphere onto 
the plane 


O' 



projective line is homeomorphic to the circle, which is usually denoted by S 1 (the 
one-dimensional sphere). 

An analogous argument can be applied to the complex projective line. It is repre- 
sented in the form C U (oo). On it, the convergence of a sequence of points Pk Q 
as k oo in the case Q (oo) corresponds to convergence of a sequence of com- 
plex numbers Zk — >■ z, where z G C, while the convergence of the sequence of points 
Pk -> (oo) corresponds to the convergence \zk I oo (here \z\ dénotés the modulus 
of the complex number z). 

For the graphical représentation of the complex projective line, Riemann pro- 
posed the following method; see Fig. 9.6. The complex numbers are depicted in the 
usual way as points in a plane. Let us consider a sphere tangent to this plane at the 
origin O, which corresponds to the complex number z = 0. Through the highest 
point O' of the sphere and any other point A of the sphere there passes a line in- 
tersecting the complex plane at a point B , which represents some number z G C. 
This yields a bijection between numbers z G C and ail the points of the sphere, with 
the exception of the point 0'\ see Fig. 9.6. This correspondence is often called the 
stereographic projection of the sphere onto the plane. By associating the point (oo) 
of the complex projective line with the point O' of the sphere, we obtain a bijec- 
tion between the points of the complex projective line C U (oo) and ail the points 
of the sphere. It is easy to see that convergence is preserved under this assignment. 
Thus the complex projective line is homeomorphic to the two-dimensional sphere 
in three-dimensional space, which is denoted by S 2 . 

In the sequel, we shall limit our considération to projective spaces P(L), where L 
is a real vector space of some finite dimension, and we shall consider for such spaces 
the property of orientability. It is related to the concept of continuous deformation 
of a linear transformation, which was introduced in Sect. 4.4. 

By définition, every projective transformation of a projective space P(L) has the 
form P(A>), where A is a nonsingular linear transformation of the vector space L. 
Moreover, as we hâve seen, the linear transformation A is determined by the pro- 
jective transformation up to a replacement by a A, where a is any nonzero number. 

Définition 9.23 A projective transformation is said to be continnously déformable 
into another if the first can be represented in the form P(*>4>i) and the second in the 
form P(eA 2 ), and the linear transformation A \ is continuously déformable into A 2 - 

Theorem 4.39 asserts that a linear transformation A \ is continuously déformable 
into e>4>2 if and only if the déterminants |A>i| and \A 2 \ hâve the same sign. What 
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happens under a replacement of A by a Al Let the projective space P(L) hâve di- 
mension n. Then the vector space Lhas dimension n + 1 , and |ofe>4>| = a w+1 |«A|. If the 
number n + 1 is even, then it is always the case that a n+l > 0, and such a replace- 
ment does not change the sign of the déterminant. In other words, in a projective 
space of odd dimension n , the sign of the déterminant |eA| of a linear transforma- 
tion A is uniquely determined by the transformation P(A). This clearly yields the 
following resuit. 

Theorem 9.24 In a projective space of odd dimension , a projective transformation 
P(eAi) is continuously déformable into P(<A 2 ) if and only if the déterminants | *>4> 1 1 
and 1^21 hâve the same sign. 

The same considérations can be applied to projective spaces of even dimension, 
but they lead to a different resuit. 

Theorem 9.25 In a projective space of even dimension , every projective transfor- 
mation is continuously déformable into every other projective transformation. 

P roof Let us show that every projective transformation P («A) is continuously dé- 
formable into the identity. If | cA | >0, then this follows at once from Theorem 4.39. 
And if \A\ <0, then the same theorem gives us that the transformation A is continu- 
ously déformable into <S, which has matrix ( "q 1 ^ ), where E n is the identity matrix 

of order n. But P(<£) = P(— <©), and the transformation — 5? has matrix _° E ). 
Since in our case, the number n is even, it follows that | — E n \ = (— \) n > 0, and 
by Theorem 4.38, the matrix (q ) is continuously déformable into E n +\, and 
consequently, the transformation — is continuously déformable into the identity. 
Thus the projective transformation P(<®) is continuously déformable into P(£), and 
this means by définition, that P (A) is also continuously déformable into P(£). □ 

Expressing these facts in topological form, we may say that the set of projective 
transformations of the space P /? of a given dimension has a single path-connected 
component if n is even, and two path-connected components if n is odd. 

Theorems 9.24 and 9.25 show that the properties of projective spaces of even and 
odd dimension are radically different. We encounter this for the first time in the case 
of the projective plane. It differs from the vector (or Euclidean) plane in that it has 
not two, but only one orientation. It is the same with projective spaces of arbitrary 
even dimension. We saw in Sect. 4.4 that the orientation of the affine plane can be 
interpreted as a choice of direction of motion around a circle. Theorem 9.25 shows 
that in the projective plane, this is already not the case — the continuous motion in 
a given direction around a circle in the projective plane can be transformed into 
motion in the opposite direction. This is possible only because our deformation at a 
certain moment “passes through infinity,” which is impossible in the affine plane. 

This property can be presented graphically using the following construction, 
which is applicable to real projective spaces of arbitrary dimension. 
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Fig. 9.7 A model of the 
projective plane 



Fig. 9.8 Identification of 
points 



Let us assume that the vector space L defining our projective space P(L) is a Eu- 
clidean space, and let us consider in this space the sphere S , defined by the equality 
|x| = 1. Every line ( x ) of the space L intersects the sphere S. Indeed, such a line 
consists of vectors of the form ax , where a e M, and the condition ax e S means 
that \ax \ — 1. Since laxl = |or| • |x| and x 0, we may set |or| = |x| -1 . With this 
choice, the number a is determined up to sign, or in other words, there exist two 
vectors, e and —e, belonging to the line (x) and to the sphere S. Thus associating 
with each vector e e S the line (x) of the projective space, we obtain the mapping 
/ : S —> P(L). The previous reasoning shows that the image of / is the entire space 
P(L). However, this mapping / is not a bijection, since two points of the sphere S 
pass through one point P e P(L), corresponding to the line (x), namely, the vectors 
e and —e. This property is expressed by saying that the projective space is obtained 
from the sphere S via the identification of its antipodal points. 

Let us apply this to the case of the projective plane, that is, we shall suppose 
that dimP(L) = 2. Then dim L = 3, and the sphere S contained in three-dimensional 
space is the sphere S 2 . Let us décomposé it into two equal parts by a horizontal 
plane; see Fig. 9.7. 

Each point of the upper hemisphere is diametrically opposite some point on the 
lower hemisphere, and we can map the upper hemisphere onto the projective plane 
P(L) by representing each point P e P(L) in the form (e), where e is a vector of the 
upper hemisphere. 

However, this correspondence will not be a bijection, since antipodal points on 
the boundary of the hemisphere will be joined together, that is, they correspond to 
a single point; see Fig. 9.8. This is expressed by saying that the projective plane is 
obtained by identifying antipodal points of the boundary of the hemisphere. 

Let us now consider a moving circle with a given direction of rotation; see 
Fig. 9.9. In the figure is shown that when the moving circle intersects the bound- 
ary of the hemisphere, the direction of rotation changes to its opposite. 

This property is expressed by saying that the projective plane is a one-sided 
surface (while the sphere in three-dimensional space and other familiar surfaces 
are two-sided). This property of the projective plane was studied by Môbius. He 
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Fig. 9.9 Motion of a circle 



Fig. 9.10 Môbius strip 


A C 


B ’D 



Fig. 9.11 Partition of the 

sphere 



Fig. 9.12 The central part of 
the sphere 



presented an example of a one-sided surface that is now known as the Môbius 
strip. It can be constructed by cutting from a sheet of paper the rectangle ABDC 
(Fig. 9.10, left) and gluing together its opposite sides AB and CD, after rotating 
CD by 180°. The one-sided surface thus obtained is shown in the right-hand picture 
of Fig. 9.10, where is also shown the continuous deformation of the circle (stages 
1 —> 2 —> 3 — ► 4), changing the direction of rotation to it opposite. 

The Môbius strip also has a direct relationship to the projective plane. Namely, 
let us visualize this plane as the sphere S 2 , in which antipodal points are identified. 
Let us divide the sphere into three parts by intersecting it with two parallel planes 
that pass above and below the equator. As a resuit, the sphere is partitioned into a 
central part U and two “caps” above and below; see Fig. 9.11. 

Let us begin by studying the central section U. For each point of £/, its antipodal 
point is also contained in U. Let us divide U into two halves — front and back — by 
a vertical plane intersecting U in the arcs AB and CD; see Fig. 9.12. 

We may combine the front half (U') with the rectangle ABDC in Fig. 9.10. 
Every point of the central section U either itself belongs to the front half or else has 
an antipodal point that belongs to the front half, of which there is only one, except 
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for the points of the segments AB and CD. In order to obtain only one of the two 
antipodal points of these segments, we must glue these segments together exactly as 
is done in Fig. 9.10. Thus the Môbius strip is homeomorphic to the part U' of the 
projective plane. To obtain the remaining part V — P(L) \U', we hâve to consider 
the “caps” on the sphere; see Fig. 9.11. For every point in a cap, its antipodal point 
lies in the other cap. This means that by identifying antipodal points, it suffices to 
consider only one cap, for example the upper one. This cap is homeomorphic to a 
disk: to see this, it suffices simply to project it onto the horizontal plane. Clearly, 
the boundary of the upper cap is identified with the boundary of the central part 
of the sphere. Thus the projective plane is homeomorphic to the surface obtained 
by gluing a circle to the Môbius strip in such a way that its boundary is identified 
with the boundary of the Môbius strip (it is easily verified that the boundary of the 
Môbius strip is a circle). 


Chapter 10 

The Exterior Product and Exterior Algebras 


10.1 Plücker Coordinates of a Subspace 

The fundamental idea of analytic geometry, which goes back to Fermât and 
Descartes, consists in the fact that every point of the two-dimensional plane or 
three-dimensional space is defined by its coordinates (two or three, respectively). 
Of course, there must also be présent a particular choice of coordinate System. In 
this course, we hâve seen that this very principle is applicable to many spaces of 
more general types: vector spaces of arbitrary dimension, as well as Euclidean, 
affine, and projective spaces. In this chapter, we shall show that it can be applied 
to the study of vector subspaces M of fixed dimension m in a given vector space 
L of dimension n > m. Since there is a bijection between the m-dimensional sub- 
spaces M c L and (m — l)-dimensional projective subspaces P(M)cP(L),we shall 
therefore also obtain a description of the projective subspaces of fixed dimension 
of a projective space with the aid of “coordinates” (certain collections of num- 
bers). 

The case of points of a projective space (subspaces of dimension 0) was already 
analyzed in the previous chapter: they are given by homogeneous coordinates. The 
same holds in the case of hyperplanes of a projective space P(L): they correspond 
to the points of the dual space P(L*). The simplest case in which the problem is 
not reduced to these two cases given above is the set of projective fines in three- 
dimensional projective space. Here a solution was proposed by Plücker. And there- 
fore, in the most general case, the “coordinates” corresponding to the subspace 
are called Plücker coordinates. Following the course of history, we shall begin in 
Sects. 10.1 and 10.2 by describing these using some coordinate System, and then 
investigate the construction we hâve introduced in an invariant way, in order to dé- 
termine which of its éléments dépend on the choice of coordinate System and which 
do not. 

Therefore, we now assume that some basis has been chosen in the vector space L. 
Since dimL = n, every vector a e L has in this basis n coordinates. Let us consider 
a subspace M c L of dimension m <n. Let us choose an arbitrary basis a \ , . . . , a m 
of the subspace M. Then M = (ai , . . . , a m ), and the vectors ai , . . . , a m are linearly 
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independent. The vector has, in the chosen basis of the space L, coordinates 
an, , a m (i = 1 , ... ,m), which we can arrange in the form of a matrix M of type 
( m,n ), writing them in row form: 



( an 

<312 

<2\n ^ 

Ü21 

(222 

‘ " (22 n 

• • 

\ttfn 1 

(2/n2 

(2mn ) 


( 10 . 1 ) 


The condition that the vectors a , a m are linearly independent means that the 
rank of the matrix M is equal to m, that is, one of its minors of order m is nonzero. 
Since the number of rows of the matrix M is equal to m, a minor of order m is 

uniquely defined by the indices of its columns. Let us dénoté by Afq i m the minor 

consisting of columns with indices i\, ... , i m , which assume the various values from 
1 to n. 

We know that not ail of the minors can be equal to zéro at the same 

time. Let us examine how they dépend on the choice of basis a\, ... ,a m in M. If 
b\, ... ,b m is some other basis of this subspace, then 


bi — b[ i a i T • • • T bj m a m , i — 1 , ... ,tyi. 


Since the vectors b\,...,b m are linearly independent, the déterminant | (b[j ) | is 
nonzero. Let us set c = | (/?;/) |. If , is a minor of the matrix M r , constructed 
analogously to M using the vectors b \, . . . , b m , then by formula (3.35) and Theo- 
rem 2.54 on the déterminant of a product of matrices, we hâve the relationship 


M\ 




m 


= cM i u-, im 


( 10 . 2 ) 


The numbers that we hâve determined are not independent. Namely, if 

the unordered collection of numbers j\, ... , j m coincides with i \, . . . , i m (that is, 
comprises the same numbers, perhaps arranged in a different order), then as we saw 
in Sect. 2.6, we hâve the relationship 


M 


J 1 » • • • » Jm 


= ±M iU "j m 


(10.3) 


where the sign + or — appears depending on whether the number of transpositions 
necessary to effect the passage from the collection (i\, ... , i m ) to (j , j m ) is 
even or odd. In other words, the function of m arguments i , i m as- 

suming the values 1 , . . . , n is antisymmetric. 

In particular, we may take as the collection (j\, , j m ) the arrangement of 
the numbers i\, ... , i m such that i\ < h < • m • < im> an d the corresponding minor 
Mi* will coincide with either M /, ; or — M;, ,. In view of this, in the 

original notation, we shall assume that i\ < < • • • < i m , and we shall set 




(10.4) 
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for ail collections i i < i 2 < • < i m of the numbers 1, . . . , n. Thus we assign to the 

subspace M as many of the numbers p lx ; m as there are combinations of n things 

taken m at a time, that is, v — C'f . From formula (10.3) and the condition that the 
rank of the matrix M is equal to m, it follows that these numbers pi u ..j m cannot 
ail become zéro simultaneously. On the other hand, formula (10.2) shows that in 
replacing the basis a \ 9 . . . , a m of the subspace M by some other basis b 1 , . . . , b m 
of this subspace, ail these numbers are simultaneously multiplied by some number 
c / 0. Thus the numbers for i\ < z '2 < • • • < i m can be taken as the homoge- 

neous coordinates of a point of the projective space P y_1 = P(N), where dim N = v 
and dimP(N) = v — 1. 

Définition 10.1 The totality of numbers pi l ,...j m in (10.4) for ail collections i\ < 
i 2 < ... < i m taking the values 1 ,...,/? is called the Plücker coordinates of the 
m-dimensional subspace M c L. 

As we hâve seen, Plücker coordinates are defined only up to a common nonzero 
factor; the collection of them must be understood as a point in the projective space 

P y_1 . 

The simplest spécial case m — 1 returns us to the définition of projective space, 
whose points correspond to one-dimensional subspaces (a) of some vector space L. 
The numbers pi u ...j m in this case become the homogeneous coordinates of a point. 
It is therefore not surprising that ail of these dépend on the choice of a coordinate 
System (that is, a basis) of the space L. Following tradition, in the sequel we shall 
allow for a certain imprécision and call “Plücker coordinates” of the subspace M 
both a point of the projective space P y_1 and the collection of numbers pi lt ...j m 
specified in this définition. 

Theorem 10.2 The Plücker coordinates of a subspace M C L uniquely détermine 
the subspace. 

P roof Let us choose an arbitrary basis a \, . . . , a m of the subspace M. It uniquely 
détermines (and not up to a common factor) the minors without regard 

to the order of the indices i 1 , . . . , i m . The minors are uniquely determined by the 
Plücker coordinates (10.4), according to formula (10.3). 

A vector x e L belongs to the subspace M = (a \ , . . . , a m ) if and only if the rank 
of the matrix 

a \2 ■“ a \ 

? 

Chn2 ‘ * * tt,nn 
X 2 X n ) 

consisting of the coordinates of the vectors a \ , . . . , a m , x in some (arbitrary) basis 
of the space L, is equal to m, that is, if ail the minors of order m + 1 of the matrix M 
are equal to zéro. Let us consider the minor that comprises the columns with indices 
forming the subset X = {k\ , . . . , k m + 1 } of the set N n — { 1 , . . . , n}, where we may 
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assume that k\ < It 2 < • • • < k m + Expanding it along the last row, we obtain the 
equality 

£><*+ = 0, (10.5) 

where A a is the cofactor of the element x a in the minor under considération. But by 
définition, the minor corresponding to A a is obtained from the matrix M by deleting 
the last row and the column with index a. Therefore, it coincides with one of the 
minors of the matrix M, and the indices of its columns are obtained by deleting the 
element a from the set X. For writing the sets thus obtained, one frequently uses the 
convenient notation 

{k 1 , • • • , k a , . . . , kyyi -f- 1 } , 

where the notation w signifies the omission of the element so indicated. Thus rela- 
tionship (10.5) can be written in the form 


777+1 

£(" D ix ki M kl ,. 





(10.6) 


Since the minors Mq j m of the matrix M are expressed in Plücker coordinates 

by formula (10.4), relationships (10.6), obtained from ail possible subsets X = 
[k i, . . . , k, „ + i} of the set N /7 , also give expressions in terms of Plücker coordinates 
of the condition a: g M, which complétés the proof of the theorem. □ 


By Theorem 10.2, Plücker coordinates uniquely define the subspace M, but as a 
rule, they cannot assume arbitrary values. It is true that for m — 1, the homogeneous 
coordinates of a point of projective space can be chosen with arbitrary numbers 
(of course, with the exception of the one collection consisting of ail zéros). Another 
equally simple case is m = n — 1 , in which subspaces are hyperplanes corresponding 
to points of P(L*). Hyperplanes are defined by their coordinates in this projective 
space, which also can be chosen as arbitrary collections of numbers (again with 
the exclusion of the collection consisting of ail zéros). It is not difficult to verify 
that these homogeneous coordinates can differ from Plücker coordinates only by 
their signs, that is, by the factor ±1. However, as we shall now see, for an arbitrary 
number m < n , the Plücker coordinates are connected to one another by certain 
spécifie relationships. 


Example 10.3 Let us consider the next case in order of complexity: n = 4, m = 2. 
If we pass to projective spaces corresponding to L and M, then this will give us a 
description of the totality of projective fines in three-dimensional projective space 
(the case considered by Plücker). 

Since n — 4, m = 2, we hâve v — Cj = 6, and consequently, each plane M c L 
has six Plücker coordinates: 


10.2 The Pliicker Relations and the Grassmannian 


353 


P 12, Pl3, P 14» P23> P 24, P34- 


(10.7) 


It is easy to see that for an arbitrary basis of the space L, we may always choose 
a basis a , b in the subspace M in such a way that the matrix M given by formula 
(10.1) will hâve the form 


M — 


1 0 a p' 

0 1 y 8 


From this follow easily the values of the Pliicker coordinates (10.7): 


P 12 = 1, PU — Y, PU — S, P23 = ~0l, p 24 = 

P34 = aS- Py, 

which yields the relationship 7734 — pnp 24 + P\ 4 P 23 = 0. In order to make this 
homogeneous, we will use the fact that p \2 = 1, and write it in the form 


P12P34 - P13P2A + P14P23 = 0. (10.8) 

The relationship (10.8) is already homogeneous, and therefore, it is preserved under 
multiplication of ail the Pliicker coordinates (10.7) by an arbitrary nonzero factor c. 
Thus relationship (10.8) remains valid for an arbitrary choice of Pliicker coordinates, 
and this means that it defines a point in some projective algebraic variety in 5- 
dimensional projective space. 1 In the following section, we shall study an analogous 
question in the general case, for arbitrary dimension m < n. 


10.2 The Plücker Relations and the Grassmannian 

We shall now describe the relationships satisfied by Pliicker coordinates of an m- 
dimensional subspace M of an n-dimensional space L for arbitrary n and m. Here 
we shall use the following notation and conventions. Although in the définition 

of Pliicker coordinates pi x - lm it was assumed that i\ < z'2 < • • • < im, now we 

shall consider numbers pi ] ,...j m also with other collections of indices. Namely, if 
(71 , ... , j m ) is an arbitrary collection of m indices taking the values 1 , . . . , n, then 
we set 


P h ./*,=<> ( 10 - 9 ) 

if some two of the numbers j 1 , . . . , j m are equal, while if ail the numbers j 1 , . . . , j m 
are distinct and (/ 1 , . . . , i m ) is their arrangement in ascending order, then we set 



m 


•> 


(10.10) 


1 This variety is called a quadric. 
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where the sign + or — dépends on whether the permutation that takes (j \ , . . . , j m ) 
to (i\, , i m ) is even or odd (that is, whether the number of transpositions is even 
or odd), according to Theorem 2.25. 

In other words, in view of equality (10.3), let us set 




» 


( 10 . 11 ) 


where (j\ , ... , j m ) is an arbitrary collection of indices assuming the values l, ... ,n. 


Theorem 10.4 For every m-dimensional subspace M of an n-dimensional space L 
and for any txvo sets (j\, ... , j m -\) and (k\, ... , k m + 1 ) of indices taking the values 
1, . . . , n, the following relationships hold : 


ra+l 

( 1 ) ,kr Pki,...,k r ,...,k m+ 1 

r = 1 



( 10 . 12 ) 


These are called the Plücker relations. 


The notation k \, . . . , k r , . . . , k m +\ means that we omit k r in the sequence 
k\, ..., k r , ... , k m 1 . 

Let us note that the indices among the numbers p Œl Œm entering relationship 

(10.12) are not necessarily in ascending order, so they are not Plücker coordinates. 
But with the aid of relationships (10.9) and (10.10), we can easily express them in 
terms of Plücker coordinates. Therefore, relationship (10.12) may also be viewed as 
a relationship among Plücker coordinates. 

Proofof Theorem 10.4 Returning to the définition of Plücker coordinates in terms of 
the minors of the matrix (10.1) and using relationship (10.11), we see that equality 
(10.12) can be rewritten in the form 


/ 77+1 


£(-d' 


' M k U ..J r ,...,k m+1 - 0 ' 


r= 1 


(10.13) 


Let us show that relationship (10.13) holds for the minors of an arbitrary matrix of 
type (m, n). To this end, let us expand the déterminant M along the last 
column. Let us dénoté the cofactor of the element a/k,. of the last column of this 
déterminant by A/, / = 1 , ... ,m. Thus the cofactor A/ corresponds to the minor 
located in the rows and columns with indices (1 m) and (j , j m - 1 ) 
respectively. Then 

m 

M h,...,j m -uk r =J2 a >kr A l- 

1 = 1 
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On substituting this expression into the left-hand side of relationship (10.13), we 
arrive at the equality 


m+ 1 


(— I )' M ). , kr -M k . £ 


■m + 1 


r = 1 


777+1 


ni 




m + 1 


7-1 


.1 = 1 


Changing the order of summation, we obtain 


777+ 1 


jm-lX ' i 


7» + 1 


7' — 1 


777 / 777+ 1 \ 

= El E ( - 1 ) '" a ^ M ii,...x,...,w ) A/ 

/-I \ 7' — 1 / 


But the sum in parenthèses is equal to the resuit of the expansion along the first row 
of the déterminant of the square matrix of order m + 1 consisting of the columns 
of the matrix (10.1) numbered k\, ... , k m + \ and rows numbered /, 1, . . . , m. This 
déterminant is equal to 


aik { 

aik 2 

' ' • a lk m + 1 

a\k { 

d\k2 

' ' ' 

a 2k\ 

a2k 2 

‘ ' ' fl 2£ m+ i 

& 777 À: 1 

G mk 2 

a mk m+ 1 


= 0 . 


Indeed, for arbitrary l = l, . . . ,m, two of its rows (numbered 1 and / + 1) coincide, 
and this means that the déterminant is equal to zéro. □ 


Example 10.5 Let us return once more to the case n = 4, m = 2 considered in 
the previous section. Relationships (10.12) are here determined by subsets ( k ) and 
(/, m, n ) of the set {1, 2, 3, 4}. If, for example, k — 1 and / = 2, m = 3, n = 4, then 
we obtain relationship (10.8) introduced earlier. It is easily verified that if ail the 
numbers k,l,m, n are distinct, then we obtain the same relationship (10.8), while 
if among them there are two that are equal, then relationship (10.12) is an identity 
(for the proof of this, we can use the antisymmetry of pij with respect to i and 
j). Therefore, in the general case, too (for arbitrary m and n), relationships (10.12) 
among the Pliicker coordinates are called the Pliicker relations. 


We hâve seen that to each subspace M of given dimension m of the space L of 
dimension n , there correspond its Pliicker coordinates 




l\ < 1 2 < • • • < l m , 


(10.14) 
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satisfying the relationships (10.12). Thus an m-dimensional subspace M c L is de- 
termined by its Plücker coordinates (10.14), completely analogously to how points 
of a projective space are determined by their homogeneous coordinates (this is in 
fact a spécial case of Plücker coordinates for m — 1). However, for m > 1, the co- 
ordinates of the subspace M cannot be assigned arbitrarily: it is necessary that they 
satisfy relationships (10.12). Below, we shall prove that these relationships are also 
sufficient for the collection of numbers (10.14) to be Plücker coordinates of some 
m-dimensional subspace M c L. For this, we shall find the following géométrie in- 
terprétation of Plücker coordinates useful. 

Relationships (10.12) are homogeneous (of degree 2) with respect to the num- 
bers pi y i m . After substitution on the basis of formulas (10.9) and (10.10), each of 

these relationships remains homogeneous, and thus they define a certain projective 
algebraic variety in the projective space P y_1 , called a Grassmann variety or simply 
Grassmannian and denoted by G(m,n). 

We shall now investigate the Grassmannian G(m,n) in greater detail. 

As we hâve seen, G(m,n) is contained in the projective space P y_1 , where 
v = C"/ (see p. 351), and the homogeneous coordinates are written as the numbers 
(10.14) with ail possible increasing collections of indices taking the values 1, . . . , n. 

The space P y_1 is the union of affine subsets C/fj i m , each of which is defined by 

the condition 7 ^ 0 for some choice of indices i 1 , . . . , i m . From this we obtain 

G{m, ri) = [J (G(m, n) (1 
ÎU — Jm 

We shall investigate separately one of these subsets G(m,n) Pi C//j f - m , for exam- 

ple, for simplicity, the subset with indices (z’i, . . . , i m ) = (1 , . . . , m). The general 
case is considered completely analogously and differs only in the numération of the 
coordinates in the space P y_1 . We may assume that for points of our affine subset 
U[ m , the number p\ m is equal to 1 . 

Relationships (10.12) give the possibility to choose Plücker coordinates (10.14) 
of the subspace M (or equivalently, the minors Mq of the matrix (10.1)) in the 
form of polynomials in coordinates pi ltmm .j m , such that among the indices i\ < Ï 2 < 

• < i m , not more than one exceeds m. Any such collection of indices obviously 
has the form (1, . . . , r, . . . , m, /), where r < m and / > m. Let us dénoté the Plücker 
coordinate corresponding to this collection by ~p r j, that is, we set ~p r j = p\ m j. 

Let us consider an arbitrary ordered collection j i < j 2 < • • • c j m of numbers 
between 1 and n. If the indices jk are less than or equal to m for ail k — 1, . . . , m, 
then the collection (j \ , 72 , . . . , j m ) coincides with the collection ( 1 , 2 , . . . , m), and 
since the Plücker coordinate p\,..., m is equal to 1, there is nothing to prove. Thus we 
hâve only to consider the remaining case. 

Let jk > m be one of the numbers j\ < 72 < • • • < j m . Let us use relationship 
( 10 . 12 ), corresponding to the collection ( 71 , . . . , jk , . . . , j m ) of m — 1 numbers and 
the collection (1, . . . , m, 7 ^) of m + 1 numbers. In this case, relationship (10.12) 
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assumes the form 


D-'» 


J m » r 


V— 1 




+ (-D m+1 



•ijk •>•••■> Jm ■> Jk 



since p\ m — 1. In view of the antisymmetry of the expression it follows 

that Pji,...,j m = p x j k i - k is equal to the sum (with alternating signs) of the 

products p ^ j k j m r~Pri- ^ among the numbers j\ , . . . , j m there were s numbers 
exceeding m, then among the numbers yi, ...» y*, ... , jm , there would be already 
5 — 1 of them. 

Repeating this process as many times as necessary, we will obtain as a resuit an 
expression of the chosen Plücker coordinate Pj u ...j m in terms of the coordinates 
~P,-h r <m, l > m. We hâve thereby obtained the following important resuit. 


Theorem 10.6 For each point in the set G(m,n) fl U \ m , ail the Plücker coordi- 

nates (10.14) are polynomials in the coordinates /?,./ = p\ ? m j, r <m, l > m. 

Since the numbers r and / satisfy 1 < r < m and m < / < n, it follows that ail 
possible collections of coordinates ~p r! form an affine subspace V of dimension 
m(n — m). B y Theorem 10.6, ail the remaining Plücker coordinates pi lim .j m are 
polynomials in ~p ri , and therefore the coordinates p r/ uniquely define a point of the 
set G(m , n) Pi U \ m . Thus is obtained a natural bijection (given by these polyno- 
mials) between points of the set G(m, n ) fl U\ m and points of the affine space V 

of dimension m(n — m). Of course, the same is true as well for points of any other 
set G(m,n) fl . In algebraic geometry, this fact is expressed by saying that 

the Grassmannian G(m, n) is covered by the affine space of dimension m {n — m). 


Theorem 10.7 Every point of the Grassmannian G(m,n) corresponds to some m- 
dimensional subspace McLfl5 described in the p rêvions section. 


Proof Since the Grassmannian G(m, n) is the union of sets G(m, n) fl CZ/j i m , it 

suffices to prove the theorem for each set separately. We shall carry out the proof 
for the set G(m, n) fl since the rest differ from it only in the numération of 

coordinates. 

Let us choose an m-dimensional subspace M c L and basis a , a m in it so 
that in the associated matrix M given by formula (10.1), the éléments residing in its 
first m columns take the form of the identity matrix E of order m . Then the matrix 
M has the form 



/I 

0 ••• 

0 

& 1/72+1 

« 177 ^ 

0 

1 ••• 

0 

^2/77+1 

' ‘ * «277 

• * 

\0 

0 ••• 

1 

^777/77+1 

«777 77 / 


(10.15) 


By Theorem 10.6, the Plücker coordinates (10.14) are polynomials in p ri — 
Moreover, by the définition of Plücker coordinates (10.4), we hâve 
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r m j. Here, in the rth row of the minor M\ °f the 

matrix (10.15), ail éléments are equal to zéro, except for the element in the last (/th) 

column, which is equal to a r i. Expanding the minor Mj m j along the rth row, 

we see that it is equal to (— 1 ) , ' +/ a r /. In other words, ~p r[ = (— 1 ) r+/ a r /. 

By our construction, ail éléments a r i of the matrix (10.15) can assume arbitrary 
values by the choice of a suitable subspace M c L and basis a \ , . . . , a m in it. Thus 
the Plücker coordinates p r/ also assume arbitrary values. It remains to observe that 
by Theorem 10.6, ail remaining Plücker coordinates are polynomials in p r/ , and 
consequently, for the constructed subspace M, they détermine the given point of the 
set G(m, / î) fl □ 


10.3 The Exterior Product 


Now we shall attempt to understand the sense in which the subspace M c L is related 
to its Plücker coordinates, after separating out those parts of the construction that 
dépend on the choice of bases e\ , . . . , e n in L and a\, . . . , a m in M from those that 
do not dépend on the choice of basis. 

Our définition of Plücker coordinates was connected with the minors of the ma- 
trix M given by formula (10.1), and silice minors (like ail déterminants) are multilin- 
ear and antisymmetric functions of the rows (and columns), let us begin by recalling 
the appropriate définitions from Sect. 2.6 (especially because now we shall need 
them in a somewhat changed form). Namely, while in Chap. 2, we considered only 
functions of rows, now we shall consider functions of vectors belonging to an arbi- 
trary vector space L. We shall assume that the space L is finite-dimensional. Then 
by Theorem 3.64, it is isomorphic to the space of rows of length n — dimL, and so 
we might hâve used the définitions from Sect. 2.6. But such an isomorphism itself 
dépends on the choice of basis in the space L, and our goal is precisely to study the 
dependence of our construction on the choice of basis. 

Définition 10.8 A function F(x ,x m ) in m vectors of the space L taking nu- 
meric values is said to be multilinear if for every index i in the range 1 to m and 
arbitrary fixed vectors a i , . . . , à / , . . . , a m , 

F ((I 1 , . . • , Clj — 1 , X i , d [-\- 1 , U m ) 

is a linear function of the vector x/ . 


For m — 1, we arrive at the notion of linear function introduced in Sect. 3.7, and 
for m = 2, this is the notion of bilinear form, introduced in Sect. 6.1. 

The définition of antisymmetric function given in Sect. 2.6 was valid for every 
set, and in particular, we may apply it to the set of ail vectors of the space L. Ac- 
cording to this définition, for every pair of distinct indices r and s in the range 1 to 
m, the relationship 



(10.16) 
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must be satisfied for every collection of vectors x\,...,x m e L. As proved in 
Sect. 2.6, it suffices to prove property (10.16) for s = r + 1, that is, a transposi- 
tion of two neighboring vectors from the collection x \ , . . . , x m is performed. Then 
property (10.16) will also be satisfied for arbitrary indices r and s. In view of this, 
we shall often formulate the condition of antisymmetry only for “neighboring” in- 
dices and use the fact that it then holds for two arbitrary indices r and s. 

If these numbers are éléments of a field of characteristic different from 2, then it 
follows that F(x \ , . . . , x m ) = 0 if any two vectors x \ , . . . , x m coincide. 

Let us dénoté by I7 m (L) the collection of ail multilinear functions of m vectors of 
the space L, and by Q m (L) the collection of ail antisymmetric functions in 77'” (L). 
The sets 77” ? (L) and Q m (L) become vector spaces if for ail F, G e 77'” (L) we define 
their sum H — F -b G e Fl m (L) by the formula 

H(x i, ...,x m ) = F(x\, ...,x m ) + G(x\, ...,x m ) 

and define for every function F e Fl m (L) the product by the scalar a as the function 
H = a F e Fl m (L) according to the formula 

H(x u ...,*m) = otF(xi, ...,x m ). 

It directly follows from these définitions that 77 m (L) is thereby converted to a vector 
space, and L) C 77 7 ”(L) is a subspace of 77 m (L). 

Let dimL = n, and let e \, . . . , e n be some basis of the space L. It follows from 
the définition that the multilinear function F(x ,x m ) is defined for ail collec- 
tions of vectors (jti, . . . , x m ) if it is defined for those collections whose vectors xi 
belong to our basis. Indeed, repeating the arguments from Sect. 2.7 Verbatim that we 
used in the proof of Theorem 2.29, we obtain for F(x \, . . . , x m ) the same formu- 
las (2.40) and (2.43). Thus for the chosen basis e \, . . . , e n , the multilinear function 
F (x i , . . . , x m ) is determined by its values F (ei l , . . . , ei m ), where i\ , . . . , i m are ail 
possible collections of numbers from the set N n = {1, . . . , n}. 

The previous line of reasoning shows that the space 77” 7 (L) is isomorphic to 
the space of functions on the set NJ” = x • • • x N n (m-fold product). It follows 
that the dimension of the space 77” 7 (L) is finite and coincides with the number of 
éléments of the set NJJL It is easy to verify that this number is equal to r m , and so 
dim/7 m (L) = n m . 

As we observed in Example 3.36 (p. 94), in a space of functions / on a finite 
set NJ”, there exists a basis consisting of 8 -functions assuming the value 1 on one 
element of NJ” and the value 0 on ail the other éléments (p. 94). In our case, we shall 
introduce a spécial notation for such a basis. Let I = (/ 1, . . . , i m ) be an arbitrary 
element of the set NJJ 2 . Then we dénoté by / j the function taking the value 1 at the 
element I and the value 0 on ail remaining éléments of the set NJJ 1 . 

We now move on to an examination of the subspace of antisymmetric multilinear 
functions Q m (L), assuming as previously that there has been chosen in L some basis 
e \, . . . , e n . To verify that a multilinear function F is antisymmetric, it is necessary 
and sufficient that property (10.16) be satisfied for the vectors e/ of the basis. In 
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other words, this reduces to the relationships 







for ail collections of vectors ei l 9 ...,ei m in the chosen basis e\,...,e n of the 
space L. Therefore, for every function F g L) and every collection (j 1 , 
j m ) g NJ”, we hâve the equality 


F(e jl ,...,ej m ) = ±F(e il , 



(10.17) 


where the numbers i , i m are the same as j , j m , but arranged in ascending 
order i \ < < • • • < / m , while the sign + or — in (10.17) dépends on whether the 

number of transpositions necessary for passing from the collection (/ 1 , . . . , i m ) to 
the collection (j \ , . . . , j m ) is even or odd (we note that if any two of the numbers 
7 i , , j m are equal, then both sides of equality (10.17) become equal to zéro). 

Reasoning just as in the case of the space FI" 1 (L), we conclude that the space 

£2 m (L) is isomorphic to the space of functions on the set N 111 c NJJ 7 , which consists 
of ail increasing sets / = (z'i, . . . , i m ) , that is, those for which i\ < z '2 < • • • < i m - 
From this it follows in particular that Q m { L) = (0) if m > n. It is easy to see that 
the number of such increasing sets I is equal to CJJ 7 , and therefore, 

dim Q m (L) = CJJ 7 . (10.18) 


We shall dénoté by F\ the 5-function of the space X2 /7Z (L), taking the value 1 on the 

set I e N JJ 7 and the value 0 on ail the remaining sets in N JJ 7 . 

The vectors ai, ... , a m e L détermine on the space £2"\ L) a linear function cp 
given by the relationship 

<P(F) = F(a \, . . . , a m ) (10.19) 

for an arbitrary element F g F2"\ L). Thus <p is a linear function on ^2 /7/ (L), that is, 
an element of the dual space X2 /77 (L)*. 


Définition 10.9 The dual space A m ( L) = X2 /77 (L)* is called the space of m -vectors 
or the mth exterior power of the space L, and its éléments are called m -vectors. 
A vector (p g A m { L) constructed with the help of relationship (10.19) involving the 
vectors ai , . . . , a m is called the exterior product (or wedge product) of ai , . . . , a m 
and is denoted by 


(p — a\ A a ,2 A • • • A a m . 


Now let us explore the connection between the exterior product and Plücker co- 
ordinates of the subspace M c L. To this end, it is necessary to choose some basis 
e \ , . . . , e n in L and some basis ai , . . . , a m in M. The Plücker coordinates of the sub- 
space M take the form (10.4), where is the minor of the matrix (10.1) that 

résides in columns i 1 , . . . , i m and is an antisymmetric function of its columns. Let 
us introduce for the Plücker coordinates and associated minors the notation 

Pi = Ph im - M I = M h,...,im - where 1 = O'i. • • • . im) eÎ". 
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To the basis of the space Q m { L) consisting of <5-functions Fj , there corresponds 
the dual basis, of the dual space A m (L), whose vectors we shall dénoté by (p I . Using 
the notation that we introduced in Sect. 3.7, we may say that the dual basis is defined 
by the condition 


(F/, (p t ) — \ for ail/ e N 1 ;;, (F/,<p 7 ) = 0 for ail/#/. (10.20) 

In particular, the vector cp — a i A «2 A • • • A a m of the space A m (L) can be expressed 
as a linear combination of vectors in this basis: 

<p= Xi( pi (10 - 21) 

/eN»' 

with certain coefficients Àj. Using formulas (10.19) and (10.20), we obtain the fol- 
lowing equality: 

^■i = (p{F I ) = F I {ai,...,a m ). 

For determining the values Fi(a i , . . . , a m ), we may make use of Theorem 2.29; 
see formulas (2.40) and (2.43). Since Fi(ej l , . . . , e j m ) — 0 when the indices of 
e j 1 , . . . , e j m form the collection / # /, then from formula (2.43), it follows that 
the values Fj(a \, . . . , a m ) dépend only on the éléments appearing in the minor 
M/. The minor M/ is a linear and antisymmetric function of its rows. In view of 
the fact that by définition, Fj {ei x , . . . , e\ m ) — 1, we obtain from Theorem 2.15 that 
Fi(a \ , . . . , a m ) = Mi = /?/. In other words, we hâve the equality 

(p — a\ A «2 A • • • A a m — ^ Micpj— ^ Pi<Pi • (10.22) 

le N™ IeN™ 

Thus any collection of m vectors a \ , . . . , a m uniquely détermines the vector 
ai A ••• A a m in the space A m ( L), where the Plücker coordinates of the subspace 
(a i , . . . , a m ) are the coordinates of this vector a \ A • • • A a m with respect to the basis 

</>/,/€ N JJ 1 , of the space A m (L). Like ail coordinates, they dépend on this basis, 
which itself is constructed as the dual basis to some basis of the space Q m (L). 

Définition 10.10 A vector x e A m (L) is said to be decomposable if it can be repre- 
sented as an exterior product 


x — a i A «2 A • • • A a m 


(10.23) 


with some G L. 

Let the m-vector x hâve coordinates xq, in some basis (pi, I e N JJ 2 , of the 
space A' n ( L). As in the case of an arbitrary vector space, the coordinates 
can assume arbitrary values in the associated field. In order for an m-vector x to 
be decomposable, that is, that it satisfy the relationship (10.23) with some vectors 


362 


10 The Exterior Product and Exterior Algebras 


a\, , a m g L, it is necessary and sufficient that its coordinates Xj { - lm coincide 

with the Pllicker coordinates p lx [ m of the subspace M = (a\, ... , a m ) in L. But 

as we established in the previous section, the collection of Pllicker coordinates of 
a subspace M C L cannot be an arbitrary collection of v numbers, but only one 
that satisfies the Plücker relations (10.12). Consequently, the Pllicker relations give 
necessary and sufficient conditions for an m-vector x to be decomposable. 

Thus for the spécification of m-dimensional subspaces M c L, we need only 
the decomposable m -vectors (the indécomposable m -vectors correspond to no m- 
dimensional subspace). However, generally speaking, the decomposable vectors do 
not form a vector space (the sum of two decomposable vectors might be an indé- 
composable vector), and also, as is easily verified, the set of decomposable vectors 
is not contained in any subspace of the space A m ( L) other than A' n (L) itself. In 
many problems, it is more natural to deal with vector spaces, and this is the reason 
for introducing the notion of a space A' n ( L) that contains ail m -vectors, including 
those that are indécomposable. 

Let us note that the basis vectors cp I themselves are decomposable: they are de- 
termined by the conditions (10.20), which, as is easily verified, taking into account 
equality (F j , <pj) = F j(ei { , . . . , ei m ), means that for a vector x — cp t , we hâve the 
représentation (10.23) for a\ = e- lx , . . . , a m — e- lm , that is, 

<Pi = c,', A c,- 2 A • • ■ A e im , I = (il, . . . , i m ). 

If e\, ... , e n is a basis of the space L, then the vectors e- n A • • • A e- lm for ail 
possible increasing collections of indices (i\, ... , i m ) form a basis of the subspace 
A m ( L), dual to the basis Fj of the space Q m { L) that we considered above. Thus 
every m-vector is a linear combination of decomposable vectors. 

The exterior product a i A • • • A a m is a function of m vectors ai G L with values in 
the space A m (L). Let us now establish some of its properties. The first two of these 
are an analogue of multilinearity, and the third is an analogue of antisymmetry, but 
taking into account that the exterior product is not a number, but a vector of the 
space y! m (L). 

Property 10.11 For every i G {1, . . . , m} and ail vectors ai,b,ce L the following 
relationship is satisfied: 


a i A • • ■ A ai- 1 A (b + c) A fl / + 1 A • • • A a m 
= a\ A • • • A a t - 1 a b A fli+i A • • • A a m 

+ a\ A • • • A ai - 1 A c A a |+ i A • • • A a m . (10.24) 


Indeed, by définition, the exterior product 


a\ A • • • A fl/_ 1 A (b + c) A fl| + i A • • • A a m 

is a linear function on the space F2 m (L) associating with each function F g Q m (L), 
the number F {a \, . . . , fl/-i , b -h c, fl/+i , . . . , a m ). Since the function F is multilin- 
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ear, it follows that 


F (a i , • • • , &i — 1 î b c, üi- |-i , . . . , ^777) 

= F (ai, a i+ 1 , . . . , «,„) + F (ai, . . . , 1 , c, a,-+i, . . . , a OT ), 

which proves equality (10.24). 

The following two properties are just as easily verified. 

Property 10.12 For every number a and ail vectors a, G L, the following relation- 
ship holds: 


a i A • • • A ai - 1 A (aai) A a [+ 1 A • • • A 

= a (ai A • • • A «i_i A ai A a/ + i A • • • A a m ). (10.25) 

Property 10.13 For ail pairs of indices r, s e {l, . . . , m] and ail vectors a,- G L, the 
following relationship holds: 


a i A • • • A A A a s +\ A • • • a a r _i a a r a a r+ i A • • • A « 7 7 7 
= —a\ A • • • A a.ç_i A a, A A • • • 

a a r _i a «ç a a r+ i A • • • a a m , (10.26) 

that is, if any two vectors from among a i , . . . , a m change places, the exterior prod- 
uct changes sign. 

If (as we assume) the numbers are éléments of a field of characteristic different 
from 2 (for example, M or C), then Property 10.13 yields the following corollary. 

Corollary 10.14 If any two of the vectors a\, ... ,a m are equal, then a\ A • • • A 

a 777 — 0 . 

Generalizing the définition given above, we may express Properties 10.1 1, 10.12, 
and 10.13 by saying that the exterior product a\ A • • • A a m is a multilinear antisym- 
metric function of the vectors a \, . . . , a m G L taking values in the space A m (L). 

Property 10.15 Vectors a\, ... , a m are linearly dépendent if and only if 

a\ A • • • A a m = 0. (10.27) 

P roof Let us assume that the vectors ai, ... , a m are linearly dépendent. Then one 
of them is a linear combination of the rest. Let it be the vector a m (the other cases 
are reduced to this one by a change in numération). Then 


Æ/77 — ot\ü\ H h oi m —\a m —\ , 
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and on the basis of Properties 10.11 and 10.12, we obtain that 
d\ A • • • A U ffi —[ACl m 

— Œ\{d\ A • • • A d m —\ A d\ ) “h * * • H - &m — 1 (æ 1 A • • • A d m —\ A d m — \ ) . 

In view of Corollary 10.14, each term on the right-hand side of this equality is equal 
to zéro, and consequently, we hâve d\ A • • • A d m — 0. 

Let us assume now that the vectors d\,...,d m are linearly independent. We 
must prove that d\ A • • • A d m ^ 0. Equality (10.27) would mean that the function 
d\ A • • • A d m (as an element of the space A m (L)) assigns to an arbitrary function 
F g L), the value F(d i, . . . , d m ) — 0. However, in contradiction to this, it is 
possible to produce a function F g F2 m { L) for which F(d i, . . . , a m ) ^ 0. Indeed, 
let us represent the space L as a direct sum 

L = (ai, ... ,a m ) ® L', 

where L' C L is some subspace of dimension n — m , and for every vector z G L, let 
us consider the corresponding décomposition z = x + y, where x g (ai, ... , d m ) 
and y g E. Finally, for vectors 

Zi = otj[d[ + • • • + oi[ m d m + y h yi G E, i = 1 , . . . , m, 

let us define a function F by the condition F(zi, . . . , z m ) — I (û'//)I ■ As we saw 
in Sect. 2.6, the déterminant is a multilinear antisymmetric function of its rows. 
Moreover, F{d\, ... ,d m ) — \E\ — 1, which proves our assertion. □ 

Let L and M be arbitrary vector spaces, and let A : L — ► M be a linear transforma- 
tion. It defines the transformation 

£2 p (A) : ^(M) -* Ï2'\ L), (10.28) 

which assigns to each antisymmetric function F (y i, . . . , y p ) in the space £2 P ( M), 
an antisymmetric function G(x \ , . . . , x p ) in the space £2 P (L) by the formula 

G(x i = x\ x p eL. (10.29) 

A simple vérification shows that this transformation is linear. Let us note that we 
hâve already met with such a transformation in the case m = 1, namely the dual 
transformation A* : M* — >• L* (see Sect. 3.7). In the general case, passing to the dual 
spaces A p ( L) = Q P (L)* and A P (\\A) = ^2^(M)*, we define the linear transformation 

A p (A) : A p ( L) -* A p ( M), (10.30) 

dual to the transformation (10.28). 

Let us note the most important properties of the transformation (10.30). 
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Lemma 10.16 Let A : L -> M and <& : M ^ N be linear transformations of arbi- 
tra ry vector spaces L, M, N. Then 

A P (£A) = A P (£)A P (A). 

P roof In view of the définition (10.30) and the properties of dual transformations 
(formula (3.61)) established in Sect. 3.7, it suffices to ascertain that 

£2 P (£A) = Ï2 P (A)C2 P (&). (10.31) 

But equality (10.31) follows directly from the définition. Indeed, the transforma- 
tion £2 P (A) maps the function F(jq, . . . , y p ) in the space £2 P ( M) to the func- 
tion G(*i, . . . , x p) in £2 P ( L) by formula (10.29). In just the same way, the trans- 
formation £2 P (£) maps the function H(z \, . . . , z p ) in X2 /:, (N) to the function 
F {y i , . . . , y p ) in ^^(M) by the analogous formula 

F(yi,---,y p ) = H(æ(y l ),...,£(y p )), y l ,...,y p e M. (10.32) 

Finally, the transformation £BA : L N takes the function H(z \ , . . . , z p ) in the 
space Q p (f\) to the function G(x\, ... , x p ) in the space £2 P ( L) by the formula 

G(x \, . . . , x p ) = H(cBA(x \), . . . , £A(x p )), x [,..., x p e L. (10.33) 

Substituting into (10.33) the vector — A(xî) and comparing the relationship thus 
obtained with (10.32), we obtain the required equality (10.31). □ 

Lemma 10.17 For ail vectors x \ , . . . , x p G L, we hâve the equality 

A p (A)(x\ A • • • A Xp) — eAOti) A • • • A A(x p ). (10.34) 

P roof Both sides of equality (10.34) are éléments of the space yl /;, (M) = £2 P ( M)*, 
that is, they are linear functions on Q P (W\). It suffices to verify that their applica- 
tion to any function F(y^ . . . , y p ) in the space £2 P ( M) gives one and the same 
resuit. But as follows from the définition, in both cases, this resuit is equal to 

Finally, we shall prove a property of the exterior product that is sometimes called 
universality. 

Property 10.18 Any mapping that carries a vector [a\, ... ,a m \ of some space M 
satisfying Properties 10.11, 10.12, 10.13 (p. 362) to m vectors a i, ...,a m of the 
space L can be obtained from the exterior product a i A • • • A a m by applying some 
uniquely defined linear transformation A : A m ( L) -> M. 

In other words, there exists a linear transformation A : A m (L) M such that for 
every collection a , a m of vectors of the space L, we hâve the equality 


[au ...,a m ] = A(a\ A • • • A a m ), 


(10.35) 
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which can be represented by the following diagram: 



(10.36) 


In this diagram, [a i, . . . , a m ] = A{a \ A • • • A a m ). 

Let us note that although L 111 — L x • • • x L (m-fold product) is clearly a vector 
space, we by no means assert that the mapping 

i-> [a\,...,a m ] 

discussed in Property 10.18 is a linear transformation L 111 M. In general, such is 
not the case. For example, the exterior product a\ A • • • A a m : L " 1 -> A m ( L) itself 
is not a linear transformation in the case that dim L > m + 1 and m > 1 . Indeed, the 
image of the exterior product is the set of decomposable vectors described by their 
Plücker relations, which is not a vector subspace of A m (L). 


P roof of Property 10.18 We can construct a linear transformation ^ : M* —> £2 m ( L) 
such that it maps every linear function f e M* to the function &(/) G £2 m ( L) de- 
fined by the relationship 


*(f) = f{[au...,a m ]). (10.37) 

By Properties 10.11-10.13, which, by assumption, are satisfied by [a \, . . . , a m \, 
the mapping ^ (/) thus constructed is a multilinear and antisymmetric function of 
a a, n . Therefore, ^ : M* (L) is a linear transformation. Let us define A 

as the dual mapping 


A = ip* : A m ( L) = X? m (L)* — > M = M**. 

By définition of the dual transformation (formula (3.58)), for every linear func- 
tion F on the space £2 m (L), its image A(F) is a linear function on the space M* 
such that A(F)(f) — F(^P{f)) for ail / e M*. Applying formula (10.37) to the 
right-hand side of the last equality, we obtain the equality 

= F(V(f)) = F(f([a 1 , . . . , «,„])). (10.38) 

Setting in (10.38) the function F^F) — ^F{a i , . . . , cifji ) , that is, F — ci\ A • • • A ci m , 
we arrive at the relationship 


Mai A ••• A a m )(f) = /([ai, ...,a m ]), 


(10.39) 
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whose left-hand side is an element of the space M**, which is isomorphic to M. 

Let us recall that the identification (isomorphism) of the spaces M** and M can 
be obtained by mapping each vector ir(f) G M** to the vector x g M for which the 
equality / (x) = ^(/) is satisfied for every linear function / g M*. Then formula 
(10.39) gives the relationship 

f{Ma\ A--- A a m j) = f([a\ a m ]), 

which is valid for every function / g M*. Consequently, from this we obtain the 
required relationship 


Ma î A ••• A a m ) = [ai, , a m ]. (10.40) 

Equality (10.40) defines a linear transformation A for ail decomposable vec- 
tors x g A m ( L). But above, we saw that every m-vector is a linear combina- 
tion of decomposable vectors. The transformation A is linear, and therefore, it is 
uniquely defined for ail m -vectors. Thus we obtain the required linear transforma- 
tion A : A m (L) -> M. □ 


10.4 Exterior Algebras* 

In many branches of mathematics, an important rôle is played by the expression 

ai A • • • A a m , 

understood not so much as a function of m vectors ai,..., a m of the space L with 
values in A m (L), but more as the resuit of repeated (ra-fold) application of the op- 
eration consisting in mapping two vectors x g A p ( L) and y g A q ( L) to the vector 
x A y g A p+q (L). For example, the expression a A b A c can then be calculated 
“by parts.” That is, it can be represented in the form a /\ b A c — (a A b) A c and 
computed by first calculating a Ab, and then (a Ab) A c. 

To accomplish this, we hâve first to define the function mapping two vectors x g 
A P (L) and y g A q (L) to the vector x A y e A p+q (L). As a first step, such a function 
x A y will be defined for the case that the vector y g A q (L) is decomposable , that 
is, representable in the form 


y — a\ A ü 2 A • • • A a q , a/ G L. (10.41) 

Let us consider the mapping that assigns to p vectors b \, . . . , b p of the space L 
the vector 


[b i , . . . , b p ] — b\A-'Ab p Aa\A'-Aa 
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and let us apply to it Property 10.18 (universality) from the previous section. We 
thereby obtain the diagram 


L p 



A p + q ( L) 


(10.42) 


In this diagram, 

Mb 1 A" -A b p ) = [bi,...,b p ]. 

Définition 10.19 Let y be a decomposable vector, that is, it can be written in the 
form (10.41). Then for every vector x e A P (L), its image A(x) for the transforma- 
tion A : A P (L) —> A p+q (L) constructed above is denoted byxAj = xA(fli A---A 
a q ) and is called the exterior product of vectors x and y. 

Thus as a first step, we defined x A y in the case that the vector y is de- 
composable. In order to define x A y for an arbitrary vector y e A q ( L), it suf- 
fices simply to repeat the same argument. Indeed, let us consider the mapping 
[ai, . . . , üq] : A q (L) A p+q (L) defined by the formula 

\d\, . . . , d q \ = X A (a 1 A • • • A dq). 

We again obtain, on the basis of Property 10.18, the same diagram: 


L q 



A p+q (L) 


(10.43) 


where the transformation A : A q (L) A p+q (L) is defined by the formula 

A(d i A---Aa q ) = [ai,...,a q ]. 

Définition 10.20 For any vectors x e A p ( L) and y e A q { L), the exterior product 
x A y is the vector A(y) e A p+q (L) in diagram (10.43) constructed above. 
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Let us note some properties of the exterior product that follow from this défini- 
tion. 

Property 10.21 For any vectors x\,x 2 £ A P (L) and y G A q ( L), we hâve the rela- 
tionship 

(*i + * 2 ) A y = xi Ay + x 2 A y. 

Similarly, for any vectors x g A P (L) and y e A q ( L) and any scalar a, we hâve the 
relationship 

(o?x) a y — a(x A y). 

Both equalities follow immediately from the définitions and the linearity of the 
transformation A in diagram (10.43). 

Property 10.22 For any vectors x G A p ( L) and y\, y 2 € A q ( L), we hâve the rela- 
tionship 

X A (jj + y 2 ) = X A J! + X A y 2 . 

Similarly, for any vectors x e A P (L) and y e A q (L) and any scalar a , we hâve the 
relationship 

x A (a y) — a(x A y). 

Both equalities follow immediately from the définitions and the linearity of the 
transformations A in diagrams (10.42) and (10.43). 

Property 10.23 For decomposable vectors x — a\ A • • • A a p and y — b\ A • • • A b q , 
we hâve the relationship 


X A y = d\ A “ ‘ A dp A b[ A • • • A bq. 


This follows at once from the définition. 

Let us note that we hâve actually defined the exterior product in such a way 
that Properties 10.21-10.23 are satisfied. Indeed, Property 10.23 defines the exterior 
product of decomposable vectors. And since every vector is a linear combination of 
decomposable vectors, it follows that Properties 10.21 and 10.22 define it in the gen- 
eral case. The property of universality of the exterior product has been necessary for 
verifying that the resuit x A y does not dépend on the choice of linear combinations 
of decomposable vectors that we use to represent the vectors x and y. 

Finally, let us make note of the following equally simple property. 


Property 10.24 For any vectors x G A p ( L) and y G A q (L), we hâve the relationship 

x a y = (— \) pq y A x. (10.44) 
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Both vectors on the right- and left-hand sides of equality (10.44) belong to the space 
A pJtq ( L), that is, by définition, they are linear functions on £2 p+q ( L). Since every 
vector is a linear combination of decomposable vectors, it suffices that we verify 
equality (10.44) for decomposable vectors. 

Let x — a\ A • • • A a p , y — b\ A • • • A b q , and let F be any vector of the space 
QP+q (L), that is, F is an antisymmetric function of the vectors x \ , . . . , x p + q in L. 
Then equality (10.44) means that 


F (a\, ...,a p ,b[, ...,b q ) = (-l) pq F(b \, . . . , b q , a \, . . . , a p ). 


p 


(10.45) 


But equality (10.45) is an obvious conséquence of the antisymmetry of the func- 
tion F. Indeed, in order to place the vector b i in the first position on the left-hand 
side of (10.45), we must change the position of b\ with each vector a\, ...,a p 
in turn. One such transposition reverses the sign, and altogether, the transpositions 
multiply F by (— \) p . Similarly, in order to place the vector in the second posi- 
tion on the left-hand side of (10.45), we also must execute p transpositions, and the 
value of F is again multiplied by (— l) p . And in order to place ail vectors b \ , . . . , b q 
at the beginning, it is necessary to multiply F by (— \) p a total of q times, and this 
ends up as (10.45). 


Our next step consists in uniting ail the sets A P (L) into a single set A(L) and 
defining the exterior product for its éléments. Here we encounter a spécial case of a 
very important algebraic notion, that of an algebra. 2 


Définition 10.25 An algebra (over some field K, which we shall consider to consist 
of numbers) is a vector space A on which, besides the operations of addition of 
vectors and multiplication of a vector by a scalar, is also defined the operation A x 
A —> A, called the product , assigning to every pair of éléments a, b e A the element 
ab e A and satisfying the following conditions: 

(1) the distributive property: for ail a,b,c e A, we hâve the relationship 

(a + b)c — ac + bc, c(a + b) — ca + cb\ (10.46) 

(2) for ail a, b e A and every scalar a g K, we hâve the relationship 

(a a) b — a (a b) — a(ab); (10.47) 

(3) there exists an element e e A, called the identity , such that for every a e A, we 
hâve ea — a and ae — a. 


Let us note that there can be only one identity element in an algebra. Indeed, 
if there existed another identity element e' , then by définition, we would hâve the 
equalities ee' — e' and ee f — e , from which it follows that e — e' . 


2 This is not a very felicitous term, since it coincides with the name of a branch of mathematics, the 
one we are currently studying. But the term has taken root, and we are stuck with it. 
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As in any vector space, in an algebra we hâve, for every a e A, the equality 
0-fl = 0 (here the 0 on the left dénotés the scalar zéro in the field K, while the 0 on 
the right dénotés the null element of the vector space A that is an algebra). 

If an algebra A is finite-dimensional as a vector space and e i , . . . , e n is a basis of 
A, then the éléments e\ , . . . , e n are said to form a basis of the algebra A, where the 
number n is called its dimension and is denoted by dimA = n. For an algebra A of 
finite dimension n, the product of two of its basis éléments can be represented in the 
form 

n 

ejej = y^a^e k , i, j = 1 , (10.48) 

k= 1 

where afj e K are certain scalars. 

The totality of ail scalars afj for ail /, j, k = l, ... ,n is called the multiplication 
table of the algebra A, and it uniquely détermines the product for ail the éléments 
of the algebra. Indeed, if x — X\e\ + • • • + k n e n and y = t±\e\ + • • • + pL n e n , then 
repeatedly applying the rules (10.46) and (10.47) and taking into account (10.48), 
we obtain 

n 

x y= 72, X i^j a U ek ’ (10.49) 

that is, the product xy is uniquely determined by the coordinates of the vectors x , y 
and the multiplication table of the algebra A. And conversely, it is obvious that for 
any given multiplication table, formula (10.49) defines in an n-dimensional vector 
space an operation of multiplication satisfying ail the requirements entering into the 
définition of an algebra, except, perhaps, property 3, which requires further consid- 
ération; that is, it converts this vector space into an algebra of the same dimension n. 

Définition 10.26 An algebra A is said to be associative if for every collection of 
three éléments a, b, and c, we hâve the relationship 

(ab)c — a(bc). (10.50) 

The associative property makes it possible to calculate the product of any num- 
ber of éléments a i, . . . , a m of an algebra A without indicating the arrangement of 
parenthèses among them; see the discussion on p. xv. Clearly, it suffices to verify 
the associative property of a finite-dimensional algebra for éléments of some basis. 
We hâve already encountered some examples of algebras. 

Example 10.27 The algebra of ail square matrices of order n. It has the finite di- 
mension n 2 , and as we saw in Sect. 2.9, it is associative. 

Example 10.28 The algebra of ail polynomials in n > 0 variables with numeric 
coefficients. This algebra is also associative, but its dimension is infinité. 
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Now we shall defïne for a vector space L of finite dimension n its exterior algebra 
A(L). This algebra has many different applications (some of them will be discussed 
in the following section); its introduction is one more reason why in Sect. 10.3, we 
did not limit our considération to decomposable vectors only, which were sufficient 
for describing vector subspaces. 

Let us define the exterior algebra A(L) as a direct sum of spaces A P (L), p > 0, 
which consist of more than just the one null vector, where A 0 (L) is by définition 
equal to K. Silice as a resuit of the antisymmetry of the exterior product we hâve 
A p ( L) = (0) for ail p > n, we obtain the following définition of an exterior algebra: 

A(L) = A°(L) L) © • • • © A n (L). (10.51) 

Thus every element u of the constructed vector space 7l(L) can be represented in 
the form u — uq + u\ H h u n , where Uj e A 1 (L). 

Our présent goal is the définition of the exterior product in yl(L), which we de- 
note by u A v for arbitrary vectors u , v e A(L). We shall define the exterior product 
u A v of vectors 


u = u 0 + u H h , d = i>o + imH hi>„, Uj , Vj e A' (L), 


as the element 

n 

U AV — U[ A Vj, 

i,j = 0 

where we use the fact that the exterior product u { A v j is already defined as an 
element of the space A l+ i (L). Thus 


u A V — Wo + W\ H 1-10/1, 


where Wk = Ui A v j , w k € ^(L). 
i+j=k 


A simple vérification shows that for the exterior product thus defined, ail the con- 
ditions for the définition of an algebra are satisfied. This follows at once from the 
properties of the exterior product x A y of vectors x e A 1 (L) and y e A J (L) proved 
earlier. By définition, A°(L) = K, and the number 1 (the identity in the field K) is 
the identity in the exterior algebra 2l(L). 

Définition 10.29 A finite-dimensional algebra A is called a graded algebra if there 
is given a décomposition of the vector space A into a direct sum of subspaces A, c A, 

A = Aq © A\ © • • • © A^, (10.52) 

and the following conditions are satisfied: for ail vectors x e A/ and y g A y, the 
product xy is in A z+/ - if i + j < k, and xy = 0 if i + j > k. Here the décomposition 
(10.52) is called a grading. 
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In this case, dimA = dim Ao H h dimA^, and taking the union of the bases of 

the subspaces A/, we obtain a basis of the space A. The décomposition (10.51) and 
the définition of the exterior product show that the exterior algebra /1(L) is graded if 
the space L has finite dimension n. Since A P (L) = (0) for ail p > n, it follows that 

n n 

dim yl(L) = ^ dim (L) = ^ Ct = 2" . 

p = 0 p = 0 

In an arbitrary graded algebra A with grading (10.52), the éléments of the subspace 
A i are called homogeneous éléments of degree /, and for every u G A/, we write 
i = deg u . One often encounters graded algebras of infinité dimension, and in this 
case, the grading (10.52) contains, in general, not a finite, but an infinité number 
of terms. For example, in the algebra of polynomials (Example 10.28), a grading is 
defined by the décomposition of a polynomial into homogeneous components. 

Property (10.44) of the exterior product that we hâve proved shows that in an ex- 
terior algebra yl(L), we hâve for ail homogeneous éléments u and v the relationship 

u a v = (— \) d v A il, where d — degw deg v. (10.53) 

Let us prove that for every finite-dimensional vector space L, the exterior algebra 
A(L) is associative. As we noted above, it suffices to prove the associative property 
for some basis of the algebra. Such a basis can constructed out of homogeneous 
éléments, and we may even choose them to be decomposable. Thus we may suppose 
that the éléments a, b, c e /1(L) are equal to 

a = a\ A • • • A cip, b — b\ A • • • A b q , c = ci A---Ac r , 
and in this case, using the properties proved above, we obtain 


a A (b A c) = a\ A • • • A a p A b\ A • • • A b q A c\ A • • • A c r = (a A b) A c. 


An associative graded algebra that satisfies relationship (10.53) for ail pairs of 
homogeneous éléments is called a superalgebra. Thus an exterior algebra A(L) of 
an arbitrary finite-dimensional vector space L is a superalgebra, and it is the most 
important example of this concept. 

Let us now return to the exterior algebra A(L) of the finite-dimensional vector 
space L. Let us choose in it a convenient basis and détermine its multiplication table. 

Let us fix in the space L an arbitrary basis e\,...,e n . Since the éléments 

(pi — e- n A • • • A e lm for ail possible collections I — (i'i, . . . , i m ) in N ™ form a 
basis of the space A m { L), m > 0, it follows from décomposition (10.51) that a 
basis in A(L) is obtained as the union of the bases of the subspaces A m ( L) for 
ail m = 1, . . . , n and the basis of the subspace A°(L) = K, consisting of a sin- 
gle nonnull scalar, for example 1. This means that ail such éléments <p\, I g NJ", 
m — l, ... ,n, together with 1 form a basis of the exterior algebra A(L). Since the 
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exterior product with 1 is trivial, it follows that in order to compose a multiplica- 
tion table in the constructed basis, we must find the exterior product <pi A <pj for ail 

possible collections of indices le N „ and Je Nj for ail l<p,q<n. 

In view of Property 10.23 on page 369, the exterior product <pj A cpj is equal to 


Vi A <PJ = e h A ■ • • A e ip A e jl A • • ■ A e jq . (10.54) 

Here there are two possibilities. If the collections I and J contain at least one 
index in common, then by Corollary 10.14 (p. 363), the product (10.54) is equal to 
zéro. 

If, on the other hand, I D J — 0, then we shall dénoté by K the collection in 
K +q comprising the indices belonging to the set I U /, that is, in other words, K 
is obtained by arranging the collection (/ 1 , . . . , i p , j\ , . . . , j q ) in ascending order. 
Then, as is easily verified, the exterior product (10.54) differs from the element 

(pK, K e N n +q i belonging to the basis of the exterior algebra A(L) constructed 
above in that the indices of the collection / U J are not necessarily arranged in 

ascending order. In order to obtain from (10.54) the element cpx, K e is 

necessary to interchange the indices (z‘i, . . . , i p , j\, . . . , j q ) in such a way that the 
resulting collection is increasing. Then by Theorems 2.23 and 2.25 from Sect. 2.6 
and Property 10.13, according to which the exterior product changes sign under the 
transposition of any two vectors, we obtain that 

Vi ^Vj =e(I, J)VK, Ke~^'n +q , 

where the number £(/, J) is equal to +1 or —1 depending on whether the number 
of transpositions necessary for passing from (i\, ... ,i p , j , j q ) to the collection 

K e N n +q is even or odd. 

As a resuit, we see that in the constructed basis of the exterior algebra A(L), the 
multiplication table assumes the following form: 


<Pi A VJ = 


!° 

[£(/, J)<PK, 


if /n/^0, 
if / n / = 0 . 


(10.55) 


10.5 Appendix* 

The exterior product x A y of vectors x e A p ( L) and y e A q { L) defined in the 
previous section makes it possible in many cases to give simple proofs of assertions 
that we encountered earlier. 

Example 10.30 Let us consider the case p — n, using the notation and results of the 
previous section. As we hâve seen, dim/l /7 (L) = C^, and therefore, the space A n ( L) 
is one-dimensional, and each of its nonzero vectors constitutes a basis. If e is such 
a vector, then an arbitrary vector of the space A n (L) can be written in the form ae 
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with a suitable scalar a. Thus for any n vectors x \ , . . . , x n of the space L, we obtain 
the relationship 

jc i A • • • A x n = o?(xi, . . . , x n )e, (10.56) 

where a(x \, . . . , x n ) is some function of n vectors taking numeric values from the 
field K. By Properties 10.11, 10.12, and 10.13, this function is multilinear and anti- 
symmetric. 

Let us choose in the space L some basis e \ , . . . , e n and set 

Xi =xne\ H \-x in e n , i = 1, 

The choice of a basis defines an isomorphism of the space L and the space K 11 of 
rows of length n, in which the vector x/ corresponds to the row (x z * i , . . . , Xj n ). Thus 
a becomes a multilinear and antisymmetric function of n rows taking numeric val- 
ues. By Theorem 2.15, the function a(x \, . . . , x n ) coincides up to a scalar multiple 
k(e) with the déterminant of the square matrix of order n consisting of the coordi- 
nates Xjj of the vectors x \, . . . , x n : 


*ii 


a(x i, . . . , x n ) = k(e) • 


*77 1 


* 1 77 


X 


7777 


(10.57) 


The arbitrariness of the choice of coefficient k(e) in formula (10.57) corresponds to 
the arbitrariness of the choice of basis e in the one-dimensional space A' 1 (L) (let us 
recall that the basis e \ , . . . , e n of the space L is fixed). 

In particular, let us choose as basis of the space A n (L) the vector 


e — e\ A • • • A e n 


(10.58) 


Vectors e \, . . . , e n are linearly independent. Therefore, by Property 10.15 (p. 363), 
the vector e is nonnull. We therefore obviously obtain that k(e) = 1. Indeed, since 
the coefficient k(e) in formula (10.57) is one and the same for ail collections of vec- 
tors X[ , . . . , x n , we can calculate it by setting x,- = ei, i — 1, . . . , n. Comparing in 
this case formulas (10.56) and (10.58), we see that a(e \, . . . , e n ) = 1. Substituting 
this value into relationship (10.57) for x/ = e,-, i = 1, . . . , n, and noting that the dé- 
terminant on the right-hand side of (10.57) is the déterminant of the identity matrix, 
that is, equal to 1, we conclude that k(e) = 1. 

Using définitions given earlier, we may associate the linear transformation 
A n (A>) : A n { L) A n (L) with the linear transformation cA : L — > I The transfor- 

mation A can be defined by indicating to which vectors x i , . . . , x n it takes the basis 
e\, . . . , e n of the space L, that is, by specifying vectors x z = A(ei), i = 1, . . . , n. By 
Lemma 10.17 (p. 365), we hâve the equality 

A n (A)(e i A • • • A e n ) = A(e i) A • • • A A(e n ) 

— x\ A • • • A x n — or(xi, . . . , x n )e. (10.59) 
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On the other hand, as we know, ail linear transformations of a one-dimensional 
space hâve the form x \-^ ax, where a is some scalar equal to the déterminant of 
the given transformation and independent of the choice of basis e in A n ( L). Thus 
we obtain that (yl 7î (eA))(jt) = ax, where the scalar a is equal to the déterminant 
|(A"( e A))| and clearly dépends only on the transformation A itself, that is, it is 
determined by the collection of vectors x ; = A (et), i — 1 , ... ,n. It is not difficult 
to see that this scalar a coincides with the function a(x\ t . . . , x n ) defined above. 
Indeed, let us choose in the space A” (L) a basis e = e\ A • • • A e n . Then the required 
equality follows directly from formula (10.59). 

Further, substituting into (10.59) expression (10.57) for a(x \, . . . , x n ), taking 
into account that k(e) = 1 and that the déterminant on the right-hand side of (10.57) 
coincides with the déterminant of the transformation A, we obtain the following 
resuit: 


A(e i) A • • • A A(e n ) — I^AK^i A • • • A e n ). (10.60) 

This relationship gives the most invariant définition of the déterminant of a linear 
transformation among ail those that we hâve encountered. 

We obtained relationship (10.60) for an arbitrary basis e\, ... ,e n of the space L, 
that is, for any n linearly independent vectors of the space. But it is also true for any 
n linearly dépendent vectors a \ , . . . , a n of this space. Indeed, in this case, the vec- 
tors A(a i), . . . , A(a n ) are clearly also linearly dépendent, and by Property 10.15, 
both exterior products a\ A • • • A a n and «A (« i ) A • • • A A(a n ) are equal to zéro. Thus 
for any n vectors a \ , . . . , a n of the space L and any linear transformation A : L — > L, 
we hâve the relationship 

cA(æ i) A • • • A A(a n ) = \A\(a\ A • • • A a n ). (10.61) 

In particular, if 33 : L L is some other linear transformation, then formula 
(10.60) for the transformation 33A : L — ► L gives the analogous equality 

(&A(e\) A • • • A 3A(e n )) = |bB«A|(^i A • • • A e n ) . 

On the other hand, from the same formula we obtain that 

(£(A(e\)) A • • ■ A dB(eA(e w ))) = \£\(A(e\) A • • • A A(e n )) 

= | cS 1 1 A | (e i A • • • A e n ) . 

Hence it follows that \£A\ = \3B\ • |eA|. This is almost a “tautological” proof of 
Theorem 2.54 on the déterminant of the product of square matrices. 

The arguments that we hâve presented acquire a more concrète character if L is 
an oriented Euclidean space. Then as the basis ei, ... , e n in L we may choose an 
orthonormal and positively oriented basis. In this case, the basis (10.58) in A n (L) 
is uniquely defined, that is, it does not dépend on the choice of basis e \, . . . , e n . 
Indeed, if e \ , . . . , e' n is another such basis in L, then as we know, there exists a linear 
transformation A : L —> L such that e' t = A(ei), i — 1, . . . , n , and furthermore, the 
transformation A is orthogonal and proper. But then |eA| = 1, and formula (10.60) 
shows that e\ A • • • A e' n = e i A • • • A e n . 
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Example 10.31 Let us show how from the given considérations, we obtain a proof 
of the Cauchy-Binet formula, which was stated but not proved in Sect. 2.9. 

Let us recall that in that section, we considered the product of two matrices B 
and A, the first of type (m, n ), and the second of type ( n , m), so that B A is a square 
matrix of order m. We are required to obtain an expression for the déterminant \BA\ 
in terms of the associated minors of the matrices B and A . Minors of the matrices B 
and A are said to be associated if they are of the same order, namely the minimum 
of n and m, and are located in the columns (of matrix B) and rows (of matrix A) 
of identical indices. The Cauchy-Binet formula asserts that the déterminant \BA \ is 
equal to 0 if n < m, and that |SA| is equal to the sum of the pairwise products over 
ail the associated minors of order m if n > m. 

Silice every matrix is the matrix of some linear transformation of vector spaces of 
suitable dimensions, we may formulate this problem as a question of the déterminant 
of the product of linear transformations A : M — ► L and 33 : L —> M, where dim L — n 
and dimM = m. Here it is assumed that we hâve chosen a basis e\, . . . , e m in the 
space M and a basis / j , . . . , f n in the space L such that the transformations A and 
S hâve matrices A and B respectively in these bases. Then SA will be a linear 
transformation of the space M into itself with déterminant |SA| = \BA\. 

Let us first prove that \BA \ = 0 if n < m. Since the image of the transformation, 
SA(M), is a subset of S(L) and dimS(L) < dimL, it follows that in the case under 
considération, we hâve the inequality 


from which it follows that the image of the transformation SA : M -> M is not 
equal to the entire space M, that is, the transformation SA is singular. This means 
that |SA| = 0, that is, |SA| = 0. 

Now let us consider the case n > m. Using Lemmas 10.16 and 10.17 from 
Sect. 10.3 with p = m , we obtain for the vectors of the basis e \, . . . , e m of the 
space M the relationship 


The vectors A(e i), . . . , A(e m ) are contained in the space L of dimension n , and 
their coordinates in the basis f , f n , being written in column form, form the 
matrix A of the transformation A : M —> L. Let us now write the coordinates of 
the vectors A(e i), . . . , A(e m ) in row form. We thereby obtain the transpose matrix 
A* of type ( m,n ). Applying formula (10.22) to the vectors A(e i), . . . , A(e m ), we 
obtain the equality 


dim(S A(M)) < dimS(L) < dim L — n < m — dim M , 


A m (S A) (e i A • • • A e m ) = A m (S) A m ( A) (e i A • • • A e m ) 

= A m (33)(A(e i) A • • • A A(e m )). (10.62) 



(10.63) 


/ cN|f 


with the functions q)j defined by formula (10.20). In the expression (10.63), ac- 
cording to our définition, Mj is the minor of the matrix A* occupying columns 
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Zi,..., i m . It is obvious that such a minor Mj of the matrix A* coincides with the mi- 
nor of the matrix A occupying rows with the same indices i 1 , . . . , i m . Thus we may 
assume that in the sum on the right-hand side of (10.63), Mj are the minors of order 
m of the matrix A corresponding to ail possible ordered collections / = (z’i , . . . , i m ) 
of indices of its rows. 

Relationships (10.62) and (10.63) together give the equality 

A m (£A)(eiA--- Ae m ) = A m (£)(^Y Mnp^j. (10.64) 

JC% 

Let us dénoté by M/ and Ni the associated minors of the matrices A and B. 
This means that the minor Mj occupies the rows of the matrix A with indices I — 
(z‘i , . . . , i m ), and the minor A/j occupies the columns of the matrix B with the same 
indices. Let us consider the restriction of the linear transformation S : L -> M to the 
subspace /,■ ) . By the définition of the functions </>/, we obtain that 

= &(f h ) A • • • A = Ni(ei A • • • A e m ). 


From this, taking into account formula (10.64), follows the relationship 
A m (3$A)(ei A • • ■ A e m ) = A m (33)( ^ M,<p, ) 

I CN™ 

= y] M ! A m (JB)(<p ,) 

/cNjf 

= ( m M/iV/|(qA-AO. 

/CN"' 

On the other hand, by Lemma 10.17 and formula (10.60), we hâve 

A m (£A)(e i A • • • A e m ) — £A(e\) A • • • A 3BA(e m ) = \£A\(e\ A • • • A e m ). 
The last two equalities give us the relationship 

\£A\= m i n i > 

/ CN™ 

which, taking into account the equality \3BA\ = \BA\, coincides with the Cauchy- 
Binet formula for the case n> m. 


Example 10.32 Let us dérivé the formula for the déterminant of a square matrix A 
that generalizes the well-known formula for the expansion of the déterminant along 
the j th column: 


A | — a\j A \j + ü2j A 2 j + • • • + a n j A 


nj » 


(10.65) 
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where Ajj is the cofactor of the element aij, that is, the number (— 1 ) l+J Mij, and 
Mij is the minor obtained by deleting this element from the matrix A along with 
the entire row and column at whose intersection it is located. The generalization 
consists in the fact that now we shall write down an analogous expansion of the 
déterminant not along a single column, but along several, thereby generalizing in a 
suitable way the notion of the cofactor. 

Let us consider a certain collection le N J” , where m is a natural number in 
the range 1 to n — 1. Let us dénoté by I the collection obtained from (1 , ,n) 

by discarding ail indices entering into I. Clearly, 1 e N n n ~ m . Let us dénoté by 
|/| the sum of ail indices entering into the collection /, that is, we shall set |/| = 

i 1 + • * • + • 

Let A be an arbitrary square matrix of order n , and let I = (i \ , . . . , i m ) and J = 

(ji, ... , j m ) be two collections of indices in N J". For the minor M\j occupying 
the rows with indices i\ , . . . , i m and columns with indices j i , . . . , j m , let us call the 
number 

A u = (-l) |/|+| - /| M 77 (10.66) 

the cofactor. It is easy to see that the given définition is indeed a generalization of 
that given in Chap. 2 of the cofactor of a single element aj j for which m = 1 and the 
collections / = (/), J = (j) each consist of a single index. 

Theorem 10.33 (Laplace’s theorem) The déterminant of a matrix A is equal to the 
sum ofthe products of ail minor s occupying any m given columns ( or rows) by their 
cofactors : 

\A\= J2 M U A IJ = Y M U A IJ ’ 

Je NJf IeN™ 

where the number m can be arbitrarily chosen in the range 1 to n — 1. 

Remark 10.34 For m — 1 and m — n — 1, Laplace’s theorem gives formula (10.65) 
for the expansion of the déterminant along a column and the analogous formula for 
expansion along a row. However, only in the general case is it possible to focus our 
attention on the symmetry between the minors of order m and those of order n — m . 

P roof of Theorem 10.33 Let us first of ail note that since for the transpose matrix, 
its rows are converted into columns while the déterminant is unchanged, it suffices 
to provide a proof for only one of the given equalities. For definiteness, let us prove 
the first — the formula for the expansion of the déterminant \A\ along m columns. 

Let us consider a vector space L of dimension n and an arbitrary basis e\, ... ,e n 
of L. Let A : L —> L be a linear transformation having in this basis the matrix A. Let 
us apply to the vectors of this basis a permutation such that the first m positions are 
occupied by the vectors e [ x , . . . , e lm , the remaining n — m positions by the vectors 
e; e; . In the basis thus obtained, the déterminant of the transformation A 
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will again be equal to | A | , since the déterminant of the matrix of a transformation 
A does not dépend on the choice of basis. Using formula (10.60), we obtain 


A • • • A A(e ij A A(e im+1 ) A • • • A A(e in ) 

= | A\(e h A • • • A e im A e im+l a ■ ■ ■ a e,„) = \A\(<pj A (10.67) 

Let us calculate the left-hand side of relationship (10.67), applying formula 
(10.22) to the two different groups of vectors. 

First, let us set a\ = A(ei l ), . . . , a m — A(ei m ). Then from (10.22), we obtain 

A{e ix ) A--- AA(e im )= ^ ^ij(pj , (10.68) 

where I = (/ 1 , . . . , i m ), and J runs through ail collections from the set N JJ 7 . 

Now let replace the number m by n — m in (10.22) and apply the formula thus 
obtained to the vectors a\ — A(ei m+l ), . . . , a n - m — A(ei n ). As a resuit, we obtain 
the equality 

Mei m+1 ) A--- AA(e in )= Y M 1 J'VJ’’ (10.69) 

J'eNr" 

where I = {i m + 1 and J ' runs through ail collections in the set N n n ~ m . 

Substituting the expressions (10.68) and (10.69) into the left-hand side of (10.67), 
we obtain the equality 

X! X! m U m Tj <Pj A( Pj' = \ A \(.Vi A <P/)- (10.70) 

Je % J e N;;- ,n 

Let us calculate the exterior product A cpj for p — m and q — n — m, mak- 
ing use of the multiplication table (10.55) that was obtained at the end of the 
previous section. In this case, it is obvious that the collection K obtained by the 
union of I and I is equal to (1 , ,n), and we hâve only to calculate the number 
£(/,/) = ± 1, which dépends on whether the number of transpositions to get from 
(i\, , i m , i m + 1 , ...,/„) to K = (1 , . . . , n) is even or odd. It is not difficult to see 
(using, for example, the same reasoning as in Sect. 2.6) that £(/,/) is equal to the 
number of pairs (/, 7), where i e I and T e /, for which the indices i and ï are in 
reverse order (form an inversion), that is, i > T. By définition, ail indices less than i\ 
appear in /, and consequently, they form an inversion with i\. This gives us L — 1 
pairs. Further, ail numbers less than *2 an d belonging to I form an inversion with 
index 12 , that is, ail numbers less than z'2 with the exception of i\, which belongs to 
1 and not I . This gives z'2 — 2 pairs. 

Continuing in this way to the end, we obtain that the number of pairs (i, T) form- 
ing an inversion is equal to (i\ — 1) + (h ~ 2) + • • • + (i m ~ m ), that is, equal to 
|/| — /x, where /x=l + -- - + m = ^m(m + l). Consequently, we finally obtain the 
formula ipj A(pj— (— l)^ - ^^, where K = (1, . . . , n). 
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The exterior product cpj A <pj* is equal to zéro for ail J and J' , with the excep- 
tion only of the case that J' — J, that is, the collections J and J' are disjoint and 
complément each other. By what we hâve said above, cpj A cp-j = (— 

Thus from (10.70) we obtain the equality 

M I jMj 7 (-l)' J ^<p K = \A\(-l)W-»<p K . (10.71) 

Je 

Multiplying both sides of equality (10.71) by the number (— 1)I 7 I+^, taking into 
account the obvious identity (— l) 2 ^ = 1, we finally obtain 

MuMjji- l) |/|+i/| = |A|, 

Je n ;? 

which, taking into account définition (10.66), gives us the required equality. □ 

Example 10.35 We began this section with Example 10.30, in which we investigated 
in detail the space A^L) for p = n. Let us now consider the case p = n — 1. As a 
resuit of the general relationship dim A P (L) = C/ z \ we obtain that dim A" -1 (L) = n. 

Having chosen an arbitrary basis e \, . . . , e n in the space L, we assign to every 
vector z G A” -1 (L) the linear function /( x) on L defined by the condition 

z A x = f(x)(e i A • • • A e n ), x e L. 

For this, it is necessary to recall that z A x belongs to the one-dimensional space 
A n ( L), and the vector e\ A • • • A e n constitutes there a basis. The linearity of the 
function f(x) follows from the properties of the exterior product proved above. Let 
us verify that the linear transformation 

F : A' î_1 ( L)— ► L* 

thus constructed is an isomorphism. Since dim A” -1 (L) = dim L* =n, to show this, 
it suffices to verify that the kernel of the transformation !F is equal to (0). As we 
know, it is possible to select as the basis of the space A” -1 (L) the vectors 

e il Ae i2 A •• • A e in _ { , i k e {1, . . . ,n }, 

uniquely up to a permutation of the collection (i [ , i); these are ail the num- 

bers (1 , ,n) except for one. This means that as the basis A n ~ [ (L) one can choose 
the vectors 

Ui = e i A • • • A et - 1 A èi A £;+ 1 • • • A e n , i — 1, . . . , n. (10.72) 

It is clear that ii f - a e ,■ = 0 if / ^ j, and w, A €[ — ±e \ A • • • A e n for ail i = 1 , . . . , n . 

Let us assume that z e A /7_1 (L) is a nonnull vector such that its associated linear 
function f(x) is equal to zéro for every x e L. Let us set z = z\u\ + • • • + z n u n - 
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Then from our assumption, it follows that z A x = 0 for ail x e L, and in particular, 
for the vectors e \ , . . . , e n . It is easy to see that from this follow the equalities z\ — 0, 
. . . , z n — 0 and hence z — 0 . 

The constructed isomorphism F : A n ~ { (L) -> L* is a refinement of the following 
fact that we encountered earlier: the Plücker coordinates of a hyperplane can be 
arbitrary numbers; in this dimension, the Plücker relations do not y et appear. 

Let us now assume that the space L is an oriented Euclidean space. On the one 
hand, this détermines a fixed basis (10.58) in A n (L) if e\, ...,e n is an arbitrary 
positively oriented orthonormal basis of L, so that the isomorphism F : yl 7Z_1 (L) -> 
L* constructed above is uniquely determined. On the other hand, for a Euclidean 
space, there is defined the standard isomorphism L* L, which does not require the 
sélection of any basis at ail in L (see p. 214). Combining these two isomorphisms, 
we obtain the isomorphism 

L) ^L, 

which assigns to the element z G A n ~ l (L) the vector x g L such that 

z A y = (x, y)(e\ a • • • a e n ) (10.73) 

for every vector y e L and for the positively oriented orthonormal basis e \ , . . . , e n , 
where (x, y) dénotés the inner product in the space L. 

Let us consider this isomorphism in greater detail. We saw earlier that the vectors 
Ui determined by formula (10.72) form a basis of the space A n ~ 1 (L). To describe the 
constructed isomorphism, it suffices to détermine which vector b e L corresponds 
to the vector a \ A • • • A a n -\ , e L. We may suppose that the vectors a \ , . . . , a n -\ 
are linearly independent, since otherwise, the vector a i A • • • A a n -\ would equal 0, 
and therefore to it would correspond the vector b — 0. Taking into account formula 
(10.73), this correspondence implies the equality 


(b, y)(e i A • • • A e n ) = a\ A • • • A a n -\ A y, (10.74) 

satisfied by ail y e L. Since the vector on the right-hand side of (10.74) is the 
null vector if y belongs to the subspace l_i = (a\, . . . , i), we may assume that 

b e Lf. 

Now we must recall that we hâve an orientation and consider L and Li to be ori- 
ented (it is easy to ascertain that the orientation of the space L does not détermine 
a natural orientation of the subspace Li , and so we must choose and fix the orienta- 
tion of Li separately). Then we may choose the basis e \ , . . . , e n in such a way that 
it is orthonormal and positively oriented and also such that the first n — 1 vectors 
e \, . . . , e n -i belong to the subspace Li, and also define in it an orthonormal and 
positively oriented basis (it is always possible to attain this, possibly after replacing 
the vector e„ with its opposite). 

Since the vector b is contained in the one-dimensional subspace = (e n ), it 
follows that b — f$e n . Using the previous arguments, we obtain that 


a i A • • • A a n —\ = v(ai,...,a n -i)e n . 
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where v(a i, . . . , a 77 _i) is the oriented volume of the parallelepiped spanned by the 
vectors a\ , . . . ,a n - 1 (see the définition on p. 221). This observation détermines the 
number fi. 

Indeed, substituting the vector y — e n into (10.74) and taking into account the 
fact that the basis e\ , . . . , e n was chosen to be orthonormal and positively oriented 
(from which follows, in particular, the equality v(e\ A • • • A e n ) = 1), we obtain the 
relationship 

fiv = v(a \, . . . , a n -i,e n ) = v(a\, a n -\). 

Thus the isomorphism $ constructed above assigns to the vector a\ A • • • A a n —\ 
the vector b = v(a \, . . . , a n -\)e n , where e n is the unit vector on the line L.f, chosen 
with the sign making the basis e \ , . . . , e n of the space L orthonormal and positively 
oriented. As is easily verified, this is équivalent to the requirement that the basis 
a \ , . . . , a n -\ , e n be positively oriented. 

The final resuit is contained in the following theorem. 

Theorem 10.36 For every oriented Euclidean space L, the isomorphism 

§, : yl' !-1 (L) A L 

assigns to the vector a\ A ••• A a n -\ the vector b e L, which is orthogonal to 
the vectors and whose length is equal to the unoriented volume 

V (a i , . . . , a n - 1 ), or more precisely, 

b=V(au...,a n -i)e, (10.75) 

where e E L is a vector of unit length orthogonal to the vectors a \, . . . , a n -\ and 
chosen in such a way that the basis a i , . . . , a n -\ , e is positively oriented. 

The vector b determined by the relationship (10.75) is called the vector product 
of the vectors ai, ... , a /7 _i and is denoted by [ai, . . . , a 77 _i]. In the case n — 3, this 
définition gives us the vector product of two vectors [ai, a 2 l familiar from analytic 
geometry. 


Chapter 11 

Quadrics 


We hâve encountered a number of types of spaces consisting of points (affine, affine 
Euclidean, projective). For ail of these spaces, an interesting and important question 
has been the study of quadrics contained in such spaces, that is, sets of points with 
coordinates (x \ , . . . , x n ) that in some coordinate System satisfy the single équation 

F(X 1 X n ) — 0, (11.1) 

where F is a second-degree polynomial in the variables x \ , . . . , x n . Let us focus our 
attention on the fact that by the définition of a polynomial, it is possible in general 
for there to be présent in équation (11.1) both first- and second-degree monomials 
as well as a constant term. 

For each of the spaces of the above-mentioned types, a trivial vérification shows 
that the property of a set of points being a quadric does not dépend on the choice of 
coordinate System. Or in other words, a nonsingular affine transformation, motion, 
or projective transformation (depending on the type of space under considération) 
takes a quadric to a quadric. 


11.1 Quadrics in Projective Space 


B y the définition given above, a quadric Q in the projective space P(L) is given by 
équation (11.1) in homogeneous coordinates. However, as we saw in Chap. 9, such 
an équation is satisfied by the homogeneous coordinates of a point of the projective 
space P(L) only if its left-hand side is homogeneous. 

Définition 11.1 A quadric in a projective space P(L) is a set Q consisting of points 
defined by équation (11.1), where F is a homogeneous second-degree polynomial, 
that is, a quadratic form in the coordinates xq, x \ , . . . , x n . 
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In Sect. 6.2, it was proved that is some coordinate System (that is, in some basis 
of the space L), équation (1 1.1) is reduced to canonical form 

A.o*o "b ^\ x \ H - * ■ ■ H - — 0, 

where ail the coefficients À/ are nonzero. Here the number r < n is equal to the rank 
of the quadratic form F, and it is the same for every System of coordinates in which 
the form F is reduced to canonical form. In the sequel, we shall assume that the 
quadratic form F is nonsingular, that is, that r — n. We shall also call the associated 
quadric Q nonsingular. The canonical form of its équation can then be written as 
follows: 

û?0*0 a l x \ "T * * * H - &n x n — 0, (1 1-2) 

where ail the coefficients a, are nonzero. The general case differs from (11.2) only 
in the omission of terms containing x; with i = r + 1, . . . , n. It is therefore easily 
reduced to the case of a nonsingular quadric. 

We hâve already encountered the concept of a tangent space to an arbitrary 
smooth hypersurface (in Chap. 7) or to a projective algebraic variety (in Chap. 9). 
Now we move on to a considération of the notion of the tangent space to a quadric. 


Définition 11.2 If A is a point on the quadric Q given by équation (11.1), then the 
tangent space to Q at the point A G Q is defined as the projective space Ta Q given 
by équation 

n 

(A)*=0. (11.3) 

7=0 ÔXl 


The tangent space is an important general mathematical concept, and we shall 
now discuss it in the greatest possible generality. Within the framework of a course 
in algebra, it is natural to limit ourselves to the case in which F is a homogeneous 
polynomial of arbitrary degree k > 0. Then équation (11.1) defines in the space 
P(L) some hypersurface X , and if not ail the partial dérivatives |^(A) are equal to 
zéro, then équation (1 1.3) gives the tangent hyperplane to the hypersurface X at the 
point A. We see that in équation (1 1.3), on the left-hand side appears the differential 
cIaF(x) (see Example 3.86 on p. 130), and since this notion was defined so as to 
be invariant with respect to the choice of coordinate System, the notion of tangent 
space is also independent of such a choice. The tangent space to the hypersurface X 
at the point A is denoted by TaX. 

In the sequel, we shall always assume that quadrics are viewed as lying in spaces 
over a field K of characteristic different from 2 (for example, for definiteness, we 
may assume that the field K is either R or C). If F(x) is a quadratic form, then by 
the assumptions we hâve made, we can write it in the form 

n 

F&) = X, a ‘j x ‘ x j' 

i,j = 0 


(11.4) 
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where the coefficients satisfy au — a-p. In other words, F(x) = <p(x, x), where 

n 

<P(x,y)= ^2 “ijXtyj (11.5) 

ij= 0 

is a symmetric bilinear form (Theorem 6.6). If the point A corresponds to the vector 
a with coordinates (ao, ex ,a n ), then 

dF A 

— (A) = 2) Ujjaj, 

U Xi „ 

j= 0 

and therefore, équation (1 1.3) takes the form 

n 

y dijŒjXi = o, 
ij= 0 

or equivalently, cp(a,x) = 0. Thus in this case, the tangent hyperplane at the point 
A coincides with the orthogonal complément (a) 1 - to the vector a e L with respect 
to the bilinear form cp(x, y). 

The définition of tangent space (11.3) loses sense if ail dérivatives |^(A) are 
equal to zéro: 

d F 

(A) = 0, i = 0, 1 , . . . , n. (11.6) 

dxi 

A point A of the hypersurface X given by équation (1 1.1) for which equalities (1 1.6) 
are satisfied is called a singular or critical point. If a hypersurface has no singular 
points, then it is said to be smooth. When the hypersurface X is a quadric, that is, 
the polynomial F is a quadratic form (1 1.4), then équations (1 1.6) assume the form 

n 

y aijOij — 0, i = 0, l, ... ,n. 

7=0 

Since the point A is in P(L), it follows that not ail of its coordinates a/ are equal to 
zéro. Thus singular points of a quadric Q are the nonzero solutions of the System of 
équations 

n 

y üijXj = 0, / = 0, l,...,n. (11-7) 

7=0 

As was shown in Chap. 2, such solutions exist only if the déterminant of the matrix 
Ç cijj ) is equal to zéro, and that is équivalent to saying that the quadric Q is singular. 
Thus a nonsingular quadric is the same thing as a smooth quadric. 

Let us consider the possible mutual relationships between a quadric Q and a line 
/ in projective space P(L). First, let us show that either the line / has not more than 
two points in common with the quadric Q , or else it lies entirely in Q. 
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Indeed, if a line / is not contained entirely in Q , then one can choose a point 
A g /, A Q. Let the line / correspond to some plane L' C L, that is, / = P(L'). If 
A = (a), then L ' = (a, b) 9 where the vector b e Lis not collinear with the vector a. 
In other words, the plane L' consists of ail vectors of the form xa + yb, where x and 
y range over ail possible scalars. The points of intersection of the line / and plane 
Q are found from the équation F(xa + y b) = 0, that is, from the équation 

F(xa + yb) — cp(xa + yb , xa + yb) 

— F(a)x 2 + 2 (p{a, b)xy + F(b)y 2 — 0 (1 1.8) 

in the variables x, y. The vectors xa -h y b with y — 0 give us a point A Q. As- 
suming, therefore, that y ^ 0, we obtain t — x /y. Then (11.8) gives us a quadratic 
équation in the variable t : 

F(xa + yb) — y 2 (F(a)t 2 + 2 cp(a, b)t + F(b)) — 0. 

The condition A ^ Q has the form F (a) ^ 0. Consequently, the leading coeffi- 
cient of the quadratic trinomial F (a) t 2 + 2<p(a, b)t + F (b) is nonzero, and therefore, 
the quadratic trinomial itself is not identically zéro and cannot hâve more than two 
roots. 

Let us now consider the mutual arrangement of Q and / if the line / passes 
through the point A e Q. Then, as in the previous case, / corresponds to the so- 
lutions of the quadratic équation (11.8), in which F (a) = 0, silice A e Q. Thus we 
obtain the équation 

F(xa + yb) — 2 (p(a, b)xy + F (b) y 2 — y(2(p(a, b)x -b F (b) y) — 0. (11.9) 

One solution of équation (11.9) is obvious: y = 0. It precisely corresponds to the 
point A e Q. This solution is unique if and only if cp(a, b) — 0, that is, if b G T\Q. 
In the latter case, clearly / c TaQ, and one says that the line / is tangent to the 
quadric Q at the point A . 

Thus there are four possible cases of the relationship between a nonsingular 
quadric Q and a line / : 

(1) The line / has no points in common with the quadric Q. 

(2) The line / has precisely two distinct points in common with the quadric Q. 

(3) The line / has exactly one point A in common with the quadric g, which is 
possible if and only if / C TaQ- 

(4) The line / lies entirely in Q. 

Of course, there also exist smooth hypersurfaces defined by équation (11.1) of ar- 
bitrary degree k > 1. For example, such a hypersurface is given by the équation 
coXç) + c\x\ + • • • + c n Xn — 0, where ail the Cj are nonzero. In the sequel, we shall 
consider only smooth hypersurfaces. For these, the left-hand side of équation (1 1.3) 
is a nonnull linear form on the vector space L, and this means that it détermines a 
hyperplane in L and in P(L). 
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Let us verify that this hyperplane contains the point A. This means that if the 
point A corresponds to the vector a = (ao, ce i , . . . , a n ), then 



If the degree of the homogeneous polynomial F is equal to k , then by Euler’s iden- 
tity (3.68), we hâve the equality 



The value of F (A) is equal to zéro, since the point A lies on the hypersurface X 
given by the équation F (A) = 0. 

Now to switch to a more familial' situation, let us consider an affine subspace of 
P(L), given by the condition xo ^ 0, and let us introduce in it the inhomogeneous 
coordinates 


yi—Xi/x o, i — 1 , . . . , /7 . (11.10) 

Let us assume that the point A lies in this subset (that is, its coordinate clq is nonzero) 
and let us write équation (11.3) in coordinates y/. To do so, we must move from 
the variables xo, x\ , . . . , x n to the variables y\ , . . . , y n and rewrite équation (1 1.3) 
accordingly. Here we must set 

F(.ï 0 , xi, ... , x n ) = Xçfiyi, y n ), (H. H) 

where f(yi, . . . , y n ) is a polynomial of degree k > 1 , already not necessarily ho- 
mogeneous (in contrast to F). In accord with formula (11.10), let us dénoté by 
a \ , . . . , a n the inhomogeneous coordinates of the point A, that is, 

ai — Œj/a o, i — 1, . . . , n. 

Using general rules for the calculation of partial dérivatives, from the représen- 
tation (11.11), taking into account (1 1.10), we obtain the formulas 
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Now let us find the values of the dérivatives calculated above of the function F at 
the point A with inhomogeneous coordinates a \, . . . , a n . The value of F (A) is zéro, 
since the point A lies in the hypersurface X and xo 7 ^ 0. B y virtue of the représen- 
tation (1 1.1 1), we obtain from this that f(a\,...,a n ) = 0. For brevity, we shall em- 
ploy the notation /(A) = f(a\, . . . ,a n ) and |^(A) = |^(æi, . . . , a n ). Thus from 
the two previous relationships, we obtain 


dF 

dx 0 



—a 


k— 1 
0 




dF 

d Xi 


C A) = a k 0 ~ l 



i — 1 , . . . , Yi . 


(11.12) 


On substituting expression (11.12) into (11.3), and taking into account (11.10), we 
obtain the équation 


—a 


.k - 1 
0 


Aa/... A/ k 

V —(A)aiX 0 + > ( «o 


-1 V 
dyi 


(A) ]Xi 


11 


= <*0 ^0 F^r~(A)(yi - a i) = 0- 


ti ^ 


k— 1 


Canceling the nonzero common factor a 0 xq, we finally obtain 


n 


9 / 


T W~{A)(ji -fl;) = 0. 

tt 9 *' 


(11.13) 


This is precisely the équation of the tangent hyperplane TaX in inhomogeneous 
coordinates. In analysis and geometry, it is written in the form (1 1. 13) for a function 
/ of a much more general class than that of polynomials. 

We may now return to the case in which the hypersurface X — Q is a nonsin- 
gular (and therefore smooth) quadric. Then for every point A e Q, équation (11.3) 
détermines a hyperplane in L, that is, some line in the dual space L*, and therefore a 
point belonging to the space P(L*), which we shall dénoté by 0(A). Thus we define 
the mapping 

0 : Ô^P(L*). (11.14) 

Our first task consists in determining what the set 0{Q) C P(L*) in fact is. For 
this, we express the quadratic form F(x) in the form F(x ) = <p(x,x), where the 
symmetric bilinear form <p(x, y) has the form (1 1.5). By Theorem 6.3, we can write 
cp(x, y) uniquely as cp(x, y) = (x, A (y)), where A : L —> L* is some linear transfor- 
mation. From the définitions, it follows that here, the radical of the form ç coincides 
with the kernel of the linear transformation A. Since in the case of a nonsingular 
form F, the radical is equal to ( 0 ), it follows that the kernel of A is also equal to 
( 0 ). Since dimL = dimL*, we hâve by Theorem 3.68 that the linear transformation 
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A is an isomorphism, and there is thereby determined a projective transformation 


Let us now write down our mapping (1 1 . 14) in coordinates. If the quadratic form 
F (x) is written in the form (1 1.4), then 


On the other hand, in some basis eo,e\ , . . . , e n of the space L, the bilinear form 
cp(x, y) has the form (11.5), where the vectors x and y are given by x — xoeo + 
• • • + x n e n and y — yo^o + • • • + y n e n ■ From this, it follows that the matrix of the 
transformation A : L — ► L* in the basis eo,e \ , . . . , e n of the space L and in the dual 
basis / 0 , f \, . . . , /„ of the space L* is equal to (cijj). Therefore, to the quadratic 
form F(x) is associated the isomorphism A : L —> L*, and the mapping (11.14) 
that we constructed coincides with the restriction of the projective transformation 
P (A) : P(L) -* P(L*) to Q , that is, <P(Q) = F(A)(Q). 

From this arises an unexpected conséquence: since the transformation P (A) is a 
bijection, the transformation (1 1.14) is also a bijection. In other words, the tangent 
hyperplanes to the nonsingular quadric Q at distinct points A, B e Q are distinct. 
Thus we obtain the following resuit. 

Lemma 11.3 The same hyperplane cannot coincide with the tangent hyperplanes 
to a nonsingular quadric Q at two distinct points. 

This means that in writing a hyperplane of the space P(L) in the form Ta Q, we 
may omit the point A. And in the case of a nonsingular quadric Q , it makes sense 
to say that the hyperplane is tangent to the quadric , and moreover, the point of 
tangency A e Q is uniquely determined. 

Let us now consider more concretely what the set <P(Q) looks like. We shall 
show that it is also a nonsingular quadric, that is, in some (and therefore in any) 
basis of the space L* determined by the équation q(x) = 0, where q is a nonsingular 
quadratic form. 

We saw above that there is an isomorphism A : L L* that bijectively maps Q 
to <F(Q). Therefore, there exists as well an inverse transformation A~ { : L* L, 
which is also an isomorphism. Then the condition y e &{()) is équivalent to 
A~ [ (y) e Q. Let us choose an arbitrary basis 


P(A>) : P(L) — ► P(L*). 



/ 0 . «/* 1 ’ " ' ’ f n 


(11.15) 


in the space L* . The isomorphism A 1 : L* L carries this basis to the basis 




(11.16) 


of the space L. Here obviously the coordinates of the vector A 1 (y) in the basis 
(11.16) coincide with the coordinates of the vector y in the basis (11.15). As we 
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saw above, the condition A 1 (y) G Q is équivalent to the relationship 

F(ao,<X[, ...,a n ) = 0, (11.17) 

where F is a nonsingular quadratic form, and (ao, oq, . . . , a n ) are the coordinates 
of the vector A~ [ (y) in some basis of the space L, for instance, in the basis (11.16). 
This means that the condition y e <P(Q) can be expressed by the same relationship 
(1 1.17). Thus we hâve proved the following statement. 

Theorem 11.4 If Q is a nonsingular quadric in the space P(L), then the set of 
tangent hyperplanes to itforms a nonsingular quadric in the space P(L*). 


Repeating Verbatim the arguments presented in Sect. 9.1, we may extend the 
duality principle formulated there. Namely, we can add to it some additional notions 
that are dual to each other that can be interchanged so that the general assertion 
formulated on p. 326 remains valid: 


nonsingular quadric in P(L) 
point in a nonsingular quadric 


nonsingular quadric in P(L*) 
hyperplane tangent to a nonsingular quadric 


This (seemingly small) extension of the duality principle leads to completely 
unexpected results. By way of an example, we shall introduce two famous theorems 
that are duals of each other, that is, équivalent on the basis of the duality principle. 
Yet the second of them was published 150 years after the first. These theorems relate 
to quadrics in two-dimensional projective space, that is, in the projective plane. In 
this case, a quadric is called a conic . 1 

In the sequel, we shall use the following terminology. Let Q be a nonsingular 
conic, and let A\, . . . , A^ be six distinct points of Q. This ordered (that is, their 
order is significant) collection of points is called a hexagon inscribed in the conic Q. 
For two distinct points A and B of the projective plane, their projective cover (that 
is, the line passing through them) is denoted by AB (cf. the définition on p. 325). 
The six fines Ai A 2 , A 2 A 3 , . . . , A 5 A 6 , A^A\ are called the s ides of the hexagon . 2 
Here the following pairs of sides will be called opposite sides : A 1 A 2 and A 4 A 5 , 
A 2 A 3 and A 5 A 6 , A 3 A 4 and AôAi. 


Theorem 11.5 (PascaTs theorem) Pairs of opposite sides of an arbitrary hexagon 
inscribed in a nonsingular cône intersect in three collinear points. See Fig. 11.1. 


1 A clarification of this terni, that is, an explanation of what this has to do with a cône, will be given 
somewhat later. 

2 Here we move away somewhat from the intuition of elementary geometry, where by a side we 
mean not the entire line passing through two points, but only the segment connecting them. This 
extended notion of a side is necessary if we wish to include the case of an arbitrary field K, for 
instance, K. = C. 
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Fig. 11.1 Hexagon inscribed 
in a conic 


A 



Before formulating the dual theorem to Pascal’s theorem, let us make a few com- 
ments. 

With the sélection of a homogeneous System of coordinates (xo : x\ : X2) in the 
projective plane, the équation of the conic Q can be written in the form 


9 9 

F(xq : x\ : X2) — a iXq + <22x0x1 + <23x0x2 + <24X[ 


-r 


'-' Z 


There are six coefficients 011 the right-hand side of this équation. If we hâve k points 
Ai, . . . , Ak, then the condition of their belonging to the conic Q reduces to the 
relationships 

F (Ai) = 0, i = l, ...,£, ( 11 . 18 ) 

which yield a System consisting of k linear homogeneous équations in the six un- 
knowns <21, . . . , ae- We must find a nontrivial solution to this System. If we hâve 
h — 6, then this question falls under Corollary 2.13 as a spécial case (and this ex- 
plains our interest in hexagons inscribed in a conic). By this corollary, we hâve still 
to verify that the déterminant of the System ( 1 1 . 18 ) for k — 6 is equal to zéro. It is 
Pascal’ s theorem that gives a géométrie interprétation of this condition. 

It is not difficult to show that it gives necessary and sufficient conditions for 
six points A 1 , . . . , Ae to lie on some conic if we restrict ourselves, first of ail, to 
nonsingular conics, and secondly, to such collections of six points that no three 
of them are collinear (this is proved in any sufficiently rigorous course in analytic 
geometry). 

Now let us formulate the dual theorem to Pascal’s theorem. Here six distinct 
fines L[ , . . . , Le tangent to a conic Q will be called a hexagon circumscribed about 
the conic. Points L\ Pi L2, L2 H L3, L3 fl L4, L4 D L5, L5 D Le, and Le H L\ are 
called the vertices of the hexagon. Here the following pairs of vertices will be called 
opposite : L\ D L2 and L4 Pi L5, L2 H L3 and L5 P Le, L3 P L4 and LeF L\. 

Theorem 11.6 (Brianchon’s theorem) The Unes connecting opposite vertices of an 
arbitrary hexagon circumscribed about a nonsingular conic intersect at a common 
point. See Fig. 1 1 . 2 . 
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Fig. 11.2 Hexagon 
circumscribed about a conic 



It is obvious that Brianchon’s theorem is obtained from Pascal’s theorem if we 
replace in it ail the concepts by their duals according to the rules given above. Thus 
by virtue of the general duality principle, Brianchon’s theorem follows from Pascal’s 
theorem. Pascal’ s theorem itself can be proved easily, but we will not présent a 
proof, since its logic is connected with another area, namely algebraic geometry. 3 
Here it is of interest to observe only that the duality principle makes it possible to 
obtain certain results from others that appear at first glance to be entirely unrelated. 
Indeed, Pascal proved his theorem in the seventeenth century (when he was 16 years 
old), while Brianchon proved his theorem in the nineteenth century, more than 150 
years later. And moreover, Brianchon used entirely different arguments (the general 
duality principle was not y et understood at the time). 


11.2 Quadrics in Complex Projective Space 

Let us now consider the projective space P(L), where L is a complex vector space, 
and as before, let us limit ourselves to the case of nonsingular quadrics. As we saw 
in Sect. 6.3 (formula (6.27)), a nonsingular quadratic form in a complex space has 

the canonical form jcq 4- x\ H I - x„. This means that in some coordinate System, 

the équation of a nonsingular quadric can be written as 

Xq H - X\ ~h * * * ~h — 0, (11.19) 

that is, every nonsingular quadric can be transformed into the quadric (11.19) by 
some projective transformation. In other words, in a complex projective space there 
exists (defined up to a projective transformation) only one nonsingular quadric 
(11.19). It is this quadric that we shall now investigate. 

In view of what we hâve said above, it suffices to consider any one arbitrary 
nonsingular quadric on the projective space P(L) of a given dimension. For example, 


3 Such a proof can be found, for example, in the book Algebraic Curves, by Robert Walker 
(Springer, 1978). 
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we may choose the quadric given by the équation F (x) =0, where the matrix of the 
quadratic form F (x) has the form 


/O 0 ••• 0 1\ 
0 0 ••• 1 0 


0 1 ••• 0 0 
\l 0 ••• 0 0 / 


( 11 . 20 ) 


A simple calculation shows that the déterminant of the matrix (1 1.20) is equal to +1 
or — 1, that is, it is nonzero. 

A fundamental topic that we shall study in this and the following sections is 
projective subspaces contained in a quadric. Let the quadric Q be given by the 
équation F(x) = 0, where x e L, and let a projective subspace hâve the form P(L'), 
where L' is a subspace of the vector space L. Then the projective subspace P(L') is 
contained in Q if and only if F (x) = 0 for ail vectors x eL' . 


Définition 11.7 A subspace L' c L is said to be isotropie with respect to a quadratic 
form F if F (x) = 0 for ail vectors x eL ' . 


Let (p be the symmetric bilinear form associated with the quadratic form F, ac- 
cording to Theorem 6.6. Then by virtue of (6.14), we see that cp(x, y) = 0 for ail 
vectors x, y e L'. Therefore, we shall also say that the subspace L C L is isotropie 
with respect to the bilinear form (p. 

We hâve already encountered the simplest example of isotropie subspaces, in 
Sect. 7.7 in our study of pseudo-Euclidean spaces. There we encountered lightlike 
(also called isotropie) vectors on which a quadratic form (x 2 ) defining a pseudo- 
Euclidean space becomes zéro. Every nonnull lightlike vector e clearly détermines 
a one-dimensional subspace (e ) . 

The basic technique that will be used in this and the following sections consists in 
how to reformulate our questions about subspaces contained in a quadric F (x) = 0 
in terms of a vector space L, a symmetric bilinear form cp(x, y) defined on L and 
corresponding to the quadratic form F(x), and subspaces isotropie with respect to 
F and cp. Then every thing is determined almost trivially on the basis of the simplest 
properties of linear and bilinear forms. 

Theorem 11.8 The dimension of an arbitrary isotropie subspace L' C L relative to 
an arbitrary nonsingular quadratic form F does not exceed halfof dimL. 

P roof Let us consider (L) 2- , the orthogonal complément of the subspace L c L 
with respect to the bilinear form <p(u,v) associated with F(x). The quadratic form 
F (x) and bilinear form <p(u, v ) are nonsingular. Therefore, we hâve relationship 
(7.75), from which follows the equality dim(l_ / )' L = dimL — dimL 7 . 
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That the space L' is isotropie means that L' c (L') 1 -. From this we obtain the 
inequality 

dim L' < dim(L / ) ± = dim L — dim L', 

from which it follows that dim L' < \ dim L, as asserted in the theorem. □ 

In the sequel, we shall limit our study of isotropie subspaces to those of the 
greatest possible dimension, namely \ dim L when the number dim L is even and 

^(dimL — 1) when it is odd. The general case dimL' < \ dimL is easily reduced to 
this limiting case and is studied completely analogously. 

Let us consider some of the simplest cases, known from analytic geometry. 

Example 11.9 The simplest case of ail is dimL = 2, and therefore, dimP(L) = 1. 
In coordinates (*o • *i), the quadratic form with matrix (11.20) has the form xqx\. 
Clearly, the quadric xqx\ = 0 consists of two points (0:1) and (1 : 0), corresponding 
to the vectors e\ — (0, 1) and e 2 = (1, 0) in the plane L. Each of the two points 
détermines an isotropie subspace L- = (ej). 

Example 11.10 Next in complexity is the case dimL = 3, and correspondingly, 
dimP(L) = 2. In this case, we are dealing with quadrics in the projective plane; 
their points détermine one-dimensional isotropie subspaces in L that therefore form 
a continuous family. (If the équation of the quadric is F(xq, x \ , X 2 ) = 0, then in the 
space L, it détermines a quadratic cône whose génératrices are isotropie subspaces.) 

Example 11.11 The following case corresponds to dimL = 4 and dimP(L) = 3. 
These are quadrics in three-dimensional projective space. For isotropie subspaces 
L' c L, Theorem 11.8 gives dimL' < 2. Isotropie subspaces of maximal dimension 
are obtained for dimL' = 2, that is, dimP(L') = 1. These are projective lines lying 
on the quadric. In coordinates (vo : x\ : yo : yi), the quadratic form with matrix 

(1 1.20) gives the équation 

xoyo + xiyi =0. (11.21) 

We must find ail two-dimensional isotropie subspaces L' c L. Let a basis of 
the two-dimensional subspace L' consist of vectors e — (ao, ai, bo, b\) and e f — 
{a q, a j , Z?q, b' { ). Then the fact that L' is isotropie is expressed, in view of formula 

(1 1.21) , by the relationship 

(flofl + a r Qv\(bç)U + b' 0 v) + { a i u + a\v)(b\u + b\v) — 0, (11.22) 

which is satisfied identically for ail u and v. The left-hand side of équation (11.22) 
represents a quadratic form in the variables u and u, which can be identically equal 
to zéro only in the case that ail its coefficients are equal to zéro. Removing paren- 
thèses in (1 1.22), we obtain 

aobo + a\b\ — 0, + ^ 0^0 + a \b\ + a[b\ — 0, 

AqZ?q + fl, b\ — 0. 


(11.23) 
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The first équation from ( 11 . 23 ) means that the rows (ciq, a\) and (b\, —bo) are 
proportional. Since they cannot both be equal to zéro simultaneously (then ail coor- 
dinates of the basis vector e would be equal to zéro, which is impossible), it follows 
that one of them is the product of the other and some (uniquely determined) scalar p. 
For definiteness, let ao = Pb 1 , = ~Pbo (the case b\ — fiao, bo — — fiai is con- 

sidered analogously). In just the same way, from the third équation of ( 11 . 23 ), we 
obtain that a' Q — y b ' v a' { = —yb' 0 with some scalar y. Substituting the relationships 

a 0 = /3bu a\—~pbç), a ' 0 = yb\ , a [ = -yb' 0 (11.24) 

into the second équation of (11.23), we obtain the equality (P — y)(b' 0 b\ — 
bob' { ) — 0. Therefore, either b'çp\ — bob\ — 0 or y — p. 

In the first case, from the equality b' 0 b\ — bob\ = 0 it follows that the rows 
(bo, b' 0 ) and (b i, b' { ) are proportional, and we obtain the relationships b\ = —abo 
and b\ — —ab' 0 with some scalar a (the case bo — —ocb\ and b' 0 = —ab\ is consid- 
ered similarly). Let us assume that b\ and b\ are not both equal to zéro. Then 
and taking into account the relationships (11.24), we obtain 

aou + %v = ciou + %v — pb\u + yb\v — —a(pbou + yb'^v) =a{a\u + a[v), 
b ou 4- b^v = (b\u + b\ v ) . 

In the second case, let us suppose that ao and a\ are not both equal to zéro. Then 
P ^ 0, and taking into account relationship (1 1.24), we obtain 

aou + ci'qV = ciou 4- a'o v = P (b\u 4- b\ u), 
bou 4- b' 0 v = —p~ x [a\u 4- a[v). 

Thus with the assumptions made for an arbitrary vector subspace L with coordi- 
nates (xq, ^î), we hâve either 


xo = axi, 

y 0 = -a l yi 

(11.25) 

*o = Py î, 

V! 

O 

1 

1 

(11.26) 


where a and P are certain nonzero scalars. 

In order to consider the excluded cases, namely a = 0 (b\ —b\ =0) and p — 0 
(ao — a\ = 0), let us introduce points (a : b) g P 1 and (c : d) e P 1 , that is, pairs 
of numbers that are not simultaneously equal to zéro, and let us consider them as 
defined up to multiplication by one and the same nonzero scalar. Then as is easily 
verified, a homogeneous représentation of relationships (11 .25) and (11 .26) that also 
includes both previously excluded cases will hâve the form 


ax o = bx i , byo — —ay\ 


(11.27) 
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and 


cxo = dy i, dyo = —cxi (11.28) 

respectively. Indeed, equality (1 1.25) is obtained from (1 1.27) for a = 1 and b — a, 
while (11.26) is obtained from (11.28) for c = 1 and d — f. 

Relationships (11.27) give the isotropie plane L r C L or the line P(L') in P(L), 
which belongs to the quadric (1 1 .21). It is determined by the point (a : b) e P 1 . Thus 
we obtain one family of lines. Similarly, relationships (11.28) détermine a second 
family of lines. Together, they give ail the lines contained in our quadric (called a 
hyperboloid of one sheet). These lines are called the rectilinear génératrices of the 
hyperboloid. 

On the basis of the formulas we hâve written down, it is easy to verify some 
properties known from analytic geometry: two distinct lines from one family of 
rectilinear génératrices do not intersect, while two lines from different families do 
intersect (at a single point). For every point of the hyperboloid, there is a line from 
each of the two families that passes through it. 

In the following section, we shall consider the general case of projective sub- 
spaces of maximum possible dimension on a nonsingular quadric of arbitrary di- 
mension in complex projective space. 


11.3 Isotropie Subspaces 

Let Q be a nonsingular quadric in a complex projective space P(L) given by the 
équation F ( x) = 0, where F (x) is a nonsingular quadratic form on the space L. In 
analogy to what we discussed in the previous section, we shall study m - dimensional 
subspaces L' C L that are isotropie with respect to F, assuming that dimL = 2m if 
dim L is even, and dim L = 2m + 1 if dim L is odd. 

The spécial cases that we studied in the preceding section show that isotropie 
subspaces look different for different values of dim L. Thus for dim L = 3, we found 
one family of isotropie subspaces, continuously parameterized by the points of the 
quadric Q. For dimL = 2 or 4, we found two such families. This leads to the idea 
that the number of continuously parameterized families of isotropie subspaces on 
a quadric dépends on the parity of the number dim L. As we shall now see, such is 
indeed the case. 

The cases of even and odd dimension will be treated separately. 

Case 1 . Let us assume that dim L = 2m. Consequently, we are interested in isotropie 
subspaces M c L of dimension m . (This is the most interesting case, since here we 
shall see how the families of lines on a hyperbola of one sheet are generalized.) 

Theorem 11.12 For every m-dimensional isotropie subspace M c L, there exists 
another m-dimensional isotropie subspace N C L such that 


L=M® N. 


(11.29) 
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P roof Our proof is by induction on the number m. For m — 0, the statement of the 
theorem is vacuously true. 

Let us assume now that m > 0, and let us consider an arbitrary nonnull vector 
e g M. Let <p(x, y) be the symmetric bilinear form associated with the quadratic 
form F(x). Since the subspace M is isotropie, it follows that <p(e,e) = 0. In view of 
the nonsingularity of F(x), the bilinear form <p(x, y) is likewise nonsingular, and 
therefore, its radical is equal to (0). Then the linear function <p(e,x) of a vector 
x g L is not identically equal to zéro (otherwise, the vector e would be in the radical 
of <p(x, y), which is equal to (0)). 

Let /g L be a vector such that (p(e, f) 0. Clearly, the vector s e , / are linearly 
independent. Let us consider the plane W = (e, /) and dénoté by p' the restriction 
of the bilinear form p to W. In the basis e , /, the matrix of the bilinear form p' has 
the form 

/ 0 ç(e, f)\ 

<P'= I , <p(e,f) #0. 

\tp(e, /) <p(f, f )/ 


It is obvious that \0'\ = —(pie, f) 2 / 0, and therefore, the bilinear form <p' is non- 
singular. 

Let us define the vector 


2 — t C. 

2 (pie, f) 

Then as is easily verified, (p(g, g) =0, cp(e, g) — (p(e, f) 0, and the vectors e , g 
are linearly independent, that is, W = (e, g). In the basis e , g , the matrix of the 
bilinear form (p' has the form 


/ 0 <p(e,g)\ 

\^(^,^) 0 / 

As a resuit of the nondegeneracy of the bilinear form (p f , we hâve by Theorem 6.9 
the décomposition 

L = W©Li, Li=W^, (11.30) 

where dim Li = 2m — 2. Let us set Mi = Li fl M and show that Mi is a subspace of 
dimension m — 1 isotropie with respect to the restriction of the bilinear form (p to 
Li. 

By construction, the subspace Mi consists of the vectors x g M such that 
(p(x,e) = 0 and tp(x, g) = 0. But the first equality holds in general for ail x g M, 
since e g M and M is isotropie with respect to (p. Thus in the définition of the sub- 
space Mi, there remains only the second equality, which means that Mi C M is 
determined by what is sent to zéro by the linear function f(x) = (p(x t g), which 
is not identically equal to zéro (since f{e ) = tp(e, g) 0). Therefore, dimMi = 
dim M — 1 = m — L 
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Thus Mi is a subspace of l_i of half the dimension of l_i, defined by formula 

(1 1.30) , and we can apply the induction hypothesis to it to obtain the décomposition 

U = Mi ©Ni, (11.31) 

where Ni c U is some other (m — l)-dimensional isotropie subspace. 

Let us note that M = (e) © Mi and let us set N = (g) © Ni. Since the subspace 
Ni is isotropie in Li, the subspace N is isotropie in L, and taking into account that 
cp(g, g) — 0, we hâve for ail vectors x g Ni the equality cp(g, x) = 0. Formulas 

(1 1.30) and (1 1.31) together give the décomposition 

L = {e) © {g) © Mi © Ni = M © N, 

which is what was to be proved. □ 

In the terminology of Theorem 11.12, an arbitrary vector z G N détermines a 
linear function f(x) = (p(z,x) on the vector space L, that is, an element of the 
dual space L* . The restriction of this function to the subspace M c L is obviously a 
linear function on M, that is, an element of the space M*. This defines the mapping 
F : N —> M*. A trivial vérification shows that !F is a linear transformation. 

The décomposition (1 1.29) established by Theorem 11.12 has an interesting con- 
séquence. 

Lemma 11.13 The linear transformation !F : N -> M* constructed above is an iso- 
morphism. 

P roof Let us détermine the kernel of the transformation F : N —> M*. Let us assume 
that F (zo) = 0 for some zo G N, that is, <p(zo, y) = 0 for ail vectors y G M. But by 
Theorem 11.12, every vector x g L can be represented in the form x = y + z, where 
y G M and z G N. Thus 

<p(zo, x) = cp(zo, y) + <p(zo, z) = <p(zo, z ) = 0, 

since both vectors z and zo belong to the isotropie subspace N. From the nonsin- 
gularity of the bilinear form ç, it then follows that zo = 0, that is, the kernel of !F 
consists of only the null vector. Since dimM = dim./V, we hâve by Theorem 3.68 
that the linear transformation !F is an isomorphism. □ 

Let e \ , . . . , e m be some basis in M, and / 1 , . . . , f m the dual basis in M*. The iso- 
morphism !F that we constructed créâtes a correspondence between this dual basis 
and a certain basis g { , . . . , g m in the space N according to the formula !F ( gj ) = f 
From décomposition (11.29) established in Theorem 11.12, it follows that vectors 
e \ , . . . , e m , gi , . . . , g m form a basis in L. In this basis, the bilinear form cp has the 
simplest possible matrix . Indeed, recalling the définitions of concepts that we 
hâve used, we obtain that 


0 = 


0 E\ 

e or 


(11.32) 
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where E and 0 are the identity and zéro matrices of order m . For the corresponding 
quadratic form F and vector 


x — X[e[ H + x m e m + x m +\g { + • • • + X2 m g m » 


we obtain 

m 

F(x) = Y, Xl x m+l . (11.33) 

i=l 

Conversely, if in some basis e i , . . . , e 2 m of the vector space L, the bilinear form cp 
has matrix (1 1.32), then the space L can be represented in the form 

L = M 0 N, M = {e i , . . . , e m ) , N = ■>••••>& 2m ) > 

in accordance with Theorem 11.12. Let us recall that in our case (in a complex pro- 
jective space), ail nonsingular bilinear forms are équivalent, and therefore, every 
nonsingular bilinear form (p has matrix (11.32) in some basis. In particular, we see 
that in the 2ra-dimensional space L, there exists an ra-dimensional isotropie sub- 
space M. 

In order to generalize known results from analytic geometry for m — 2 to the case 
of arbitrary m (see Example 11.11), we shall provide several définitions that natu- 
rally generalize some concepts about Euclidean spaces familiar to us from Chap. 7. 

Définition 11.14 Let <p(x, y) be a nonsingular symmetric bilinear form in the space 
L of arbitrary dimension. A linear transformation VL : L — ► L is said to be orthogonal 
with respect to <p if 

cp{U(x),U (y)) = <p(x,y) (11.34) 

for ail vectors x, y e L. 

This définition generalizes the notion of orthogonal transformation of a Eu- 
clidean space and Lorentz transformation of a pseudo-Euclidean space. Similarly, 
we shall call a basis e \ , . . . , e n of a space L orthonormal with respect to a bilinear 
form (p if <p{e i, e - { ) = 1 and <p(ei,ej ) = 0 for ail i ^ j. Every orthogonal trans- 
formation takes an orthonormal basis into an orthonormal basis, and for any two 
orthonormal bases, there exists a unique orthogonal transformation taking the first 
of them to the second. The proofs of these assertions coincide word for word with 
the analogous assertions from Section 7.2, since there we nowhere used the positive 
definiteness of the bilinear form (x, y), but only its nonsingularity. 

The condition (11.34) can be expressed in matrix form. Let the bilinear form 
(p hâve matrix 0 in some basis e \, . . . , e n of the space L. Then the transformation 
T/ : L — > L will be orthogonal with respect to (p if and only if its matrix U in this 
basis satisfies the relationship 


U*&U = &. 


(11.35) 
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This is proved just as was the analogous equality (7.18) for orthogonal transforma- 
tions of Euclidean spaces, and (7.18) is a spécial case of formula (11.35) for = E. 

It follows from formula (11 .35) that \U*\-\0\-\U\ = \ 0 | , and taking into account 
the nonsingularity of the form cp (|0| ^ 0), that \U*\ • \U\ = 1, that is, \U\ 2 = 1. 
From this we finally obtain the equality | U \ = d= 1 , in which | U | can be replaced by 
\V.\, since the déterminant of a linear transformation does not dépend on the choice 
of basis in the space, and consequently, coincides with the déterminant of the matrix 
of this transformation. 

The equality \ e U\ = db 1 generalizes a well-known property of orthogonal trans- 
formations of a Euclidean space and provides justification for an analogous défini- 
tion. 

Définition 11.15 A linear transformation VL : L — ► L orthogonal with respect to a 
symmetric bilinear form cp is said to be proper if | VL \ = 1 and improper if | VL \ — — 1 . 

It follows at once from Theorem 2.54 on the déterminant of the product of ma- 
trices that proper and improper transformations multiply just like the numbers -1-1 
and —1. Similarly, the transformation T( _1 corresponds to the same type (of proper 
or improper orthogonal transformation) as VL. 

The concepts that we hâve introduced can be applied to the theory of isotropie 
subspaces on the basis of the following resuit. 

Theorem 11.16 For any two m-dimensional isotropie subspaces M and IVf ofa 2m- 
dimensional space L, there exists an orthogonal transformation V. : L -> L taking 
one ofthe subspaces to the other. 

P roof Since Theorem 11.12 can be applied to each of the subspaces M and IVf, there 
exist m-dimensional isotropie subspaces N and N' such that 


As we hâve noted above, from the décomposition L = M ® N, it follows that in the 
space L, there exists a basis e i , . . . , e 2 m comprising the bases of the subspaces M 
and N in which the matrix of the bilinear form (p is equal to (11.32). The second 
décomposition L = M' ® N' gives us a similar basis e \ , . . . , e ' 2m . 

Let us define the transformation VL by the action on the vectors of the basis 
ei, , 62 m according to the formula 7i(e/) = e\ for ail i = 1, . . . , 2m. It is obvious 
that then the image T((M) is equal to IVf . Furthermore, for any two vectors x = 

X\e\ H b X 2 m& 2 m and y = y\e\ H b yim^m, their images U(x) and U(y) 

hâve, in the basis e \, . . . , e' 2m , décompositions with the same coordinates: Uix) = 
x\e\ H b X 2 m e'i m and VL(y) = y\e\ H b y 2m ^ 2 m- F rom this it follows that 


L = M © N = M' © N'. 


2m 



i = 1 
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showing that VL is an orthogonal transformation. □ 

Let us note that Theorem 11.16 does not assert the uniqueness of such a trans- 
formation V. In fact, such is not the case. Let us consider this question in more 
detail. Let Vi\ and V 2 be the two orthogonal transformations that were the subject 
of Theorem 11.16. Applying to both sides of the equality V\ (M) = VL 2 (M) the trans- 
formation Vif 1 , we obtain T(o(M) = M, where Vio = Vif 1 V 2 is also an orthogonal 
transformation. Our further considérations are based on the following resuit. 

Lemma 11.17 Let M be an m-dimensional isotropie subspace of a 2m-dimensional 
space L, and let Vio : L — > L be an orthogonal transformation taking M to itself 
Then the transformation Vio is proper. 

P roof By assumption, M is an invariant subspace of the transformation Vio. This 
means that in an arbitrary basis of the space L whose first m vectors form a basis of 
M, the matrix of the transformation Vio bas the block form 

£/°=(o *), (H.36) 

where A, B , C are square matrices of order m. 

The orthogonality of the transformation Vio is expressed by the relationship 
(11.35), in which, as we hâve seen, with the sélection of a suitable basis, we may 
consider that relationship (11.32) is satisfied. Setting in (11.35) in place of U the 
matrix (11.36), we obtain 

/A* 0\ /O E\ (A £\_/0 E\ 

\B* C*)\E 0J \0 C)~\E O/' 

Multiplying the matrices on the left-hand side of this equality brings it into the form 

where D = C*£ + £*C. 

From this, we obtain in particular A*C = E , and this means that |A*| • \C\ = 1. But 
in view of |A*| = |A|, from (11.36) we hâve |t/ol = \A\ • |C| = 1, as asserted. □ 

From Lemma 11.17 we deduce the following important corollary. 

Theorem 11.18 If M and M' are two m-dimensional isotropie subspaces of a 2m- 
dimensional space L, then the orthogonal transformations VL : L -> L taking one of 
these subspaces into the other are either ail proper or ail improper. 

Proof Let Vi\ and VI 2 be two orthogonal transformations such that Vii( M) = M'. It 
is clear that then Vif 1 (M') = M. Setting Vio = "bif 1 1 / 2 » from the equality TL(M) = 
Tt 2 (M) we obtain that T(o(M) = M. By Lemma 11.17, \ Vio\ = 1, and from the rela- 
tionship Vio = Vif [ Vi 2 , it follows that \ V.i \ = | 1 * □ 


( 0 A*C\ _ / 0 E\ 
yC*A D )~\E Oj’ 
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Theorem 11.18 détermines in an obvious way a partition of the set of ail m- 
dimensional isotropie subspaces M of a 2m-dimensional space L into two familles 
9Jîi and 9JL- Namely, M and M' belong to one family if an orthogonal transfor- 
mation U taking one of these subspaces into the other (which always exists, by 
Theorem 11.16) is proper (it follows from Theorem 11.18 that this définition does 
not dépend on the choice of a spécifie transformation VL). 

Now we can easily prove the following property, which was established in the 
previous section for m = 2, for any m. 

Theorem 11.19 Two m-dimensional isotropie subspaces M and IVf of a 2m- 
dimensional space L belong to one family 971/ if and only if the dimension of their 
intersection M fl IVf has the same parity as m . 

Proof Let us recall that natural numbers k and m hâve the same parity if k + m 
is even, or equivalently, if (— \) k + m — 1. Recalling now the définition of the parti- 
tion of the set of m-dimensional isotropie subspaces into families 9Jl\ and 9JL and 
setting k = dim(M fl IVf), we may formulate the assertion of the theorem as follows: 

\u\ = (-l) k+m , (11.37) 

where VL is an arbitrary orthogonal transformation taking M to M', that is, a trans- 
formation such that VL( M) = M'. 

Let us begin the proof of relationship (1 1.37) with the case k — 0, that is, the case 
that MnM ; = (0). Then in view of the equality dim M + dim M' = dim L, the sum of 
subspaces M + M' = M ® IVf coincides with the entire space L. This means that IVf 
exhibits ail the properties of the isotropie subspace N constructed for the proof of 
Theorem 11.12. In particular, there exist bases e \, . . . , e m in M and f ^ , . . . , f m in 
M' such that 

<POi,/,) = 1 for i = 1, ...,m, <p(ei,fj) = 0 for i / j. 

We shall détermine the transformation VL : L —> L by the conditions VLief) — fi 
and VL(fj) = ei for ail i = 1, . . . , m. It is clear that VL( M) = M' and T^M') = M. It 
is equally easy to see that in the basis e \ , . . . , e m , f \ , . . . , f m , the matrices of the 
transformation VL and bilinear form cp coincide and hâve the form (11.32). Substi- 
tuting the matrix (1 1.32) in place of U and <P into formula (1 1.35), we see that it is 
converted to a true equality, that is, the transformation VL is orthogonal. 

On the other hand, we hâve, therefore, the equality \VL\ = \&\ = (— \) m . It is 
easy to convince oneself that \&\ = (—l) m by transposing the rows of the matrix 
(11.32) with indices i and m + i for ail i = 1, . . . , m. Here we shall carry out m 
transpositions and obtain the identity matrix of order 2m with déterminant 1. As 
a resuit, we arrive at the equality \ VL\ = (— l) m , that is, at relationship (11.37) for 
k = 0. 

Now let us examine the case k > 0. Let us define the subspace Mi = M fl M'. Then 
k = dim Mi. By Theorem 11.12, there exists an m-dimensional isotropie subspace 
N c L such that L = M ® N. Let us choose in the subspace M a basis e \, . . . , e m 
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such that its first k vectors e \ , . . . , form a basis in Mi . Then clearly, we hâve the 
décomposition 

M = Mi © M 2 , where Mi = {e\, ...,e k ), M 2 = (e*+i, . • . ,e m ). 

Above (see Lemma 11.13), we constructed the isomorphism !F : N M* and 
with its help, defined a basis , . . . , g m in the space N by formula F ( gj ) = /, , 
where /j , . . . , f m is a basis of the space M*, the dual basis to e \ , . . . , e m . We obvi- 
ously hâve the décomposition 

N = Ni © N 2 , where b\ { = (g l9 ...,g k ), N 2 = (g k + Ï9 . . . , g m ), 

where by our construction, !F : Ni N\* and !F : N 2 M^. 

Let us consider the linear transformation Ko : L — ► L defined by the formula 

Uoie,) = gi , U 0 (gi) = ei for i = 1, . . . , k, 

K 0 (et) = et, K 0 (gj) = g t for i = k + 1, . . . , m. 

It is obvious that the transformation Ko is orthogonal, and also Kq = 8 and 

K 0 (Mi) = Ni, K 0 (M 2 ) = M 2 , 

(11.38) 

K 0 (N 1 ) = M 1 , K 0 (N 2 ) = N 2 . 

In the basis e \, . . . , e m , g { , . . . , g m that we constructed in the space L, the matrix of 
the transformation Ko has the block form 


(0 

0 

E k 

0 \ 

0 

Em—k 

0 

0 

E k 

0 

0 

0 

(o 

0 

0 

Em—k / 


where Ek and are the identity matrices of orders k and m — k. As is évident, 
Uo becomes the identity matrix after the transposition of its rows with indices i and 
m + i, i = 1, . . . , k. Therefore, | Ko I = (— 1)*. 

Let us prove that Ko(M ; ) n M = (0). Since K ( 2 } = 8, this is équivalent to 
M 7 fl Ko(M) = (0). Let us assume that x e M ; Pi Ko(M). From the membership 
x g Ko(M) and décomposition M = Mi ® M 2 , taking into account (11.38), it fol- 
lows that x g Ni ® M 2 , that is, 


* = zi+.y 2 , where zi G Ni, y 2 G M 2 . (11.39) 

Thus for every vector jj g Mi, we hâve the equality 

(p(x, y { ) = <p(z\,y\) + (p(y 2 , J i)- (11.40) 

The left-hand side of equality (11.40) equals zéro, since x g M ; , y { e Mi c M ; , 
and the subspace M r is isotropie with respect to cp. The second term <^(y 2 ,y i) 
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on the right-hand side is equal to zéro, since j, g M; C M, / = 1, 2, and the sub- 
space M is isotropie with respect to <p. Thus from relationship (11.40), it follows 
that (p(z\, Ji) = 0 for every vector y { g Mi. 

This last conclusion means that for the isomorphism F : Ni M*, there cor- 
responds to the vector z i G N i , a linear function on M i that is identically equal to 
zéro. But that can be the case only if the vector z i itself is equal to 0. Thus in the 
décomposition (11.39), we hâve z\ — 0 , and therefore, the vector x — y 2 is con- 
tained in the subspace IVb. On the other hand, by virtue of the inclusions M 2 C M 
and x g M ' H T(o(M), taking into account the définition of the subspace Mi = M fl M 7 , 
this vector is also contained in Mi. As a resuit, we obtain that x g Mi D IVb, while 
by virtue of the décomposition M = Mi ® M 2 , this means that x = 0. 

Thus the subspaces TtotM') and M are included in the case k — 0 already consid- 
ered, and relationship (11.37) has been proved for them. By Theorem 11.16, there 
exists an orthogonal transformation XL\ : L -> L such that £ Ki(T(o(M / )) = M. Then, 
as we hâve proved, | Xl\ \ — (— l) m . The orthogonal transformation XI — XI 1 Ko takes 
the isotropie subspace M' to M, and for it we hâve the relationship 

|K| = Ittil ■ \U 0 \ = (-l) m (— 1)* = (- D k+m , 


which complétés the proof of the theorem. □ 

We note two corollaries to Theorem 11.19. 

Corollary 11.20 The familles 9JI 1 and ÏUI 2 do not hâve an m-dimensional isotropie 
subspace in common. 

Proof Let us assume that two such m-dimensional isotropie subspaces Mj g 9JÎi 
and M 2 g 9JI 2 are to be found such that Mi = M 2 . Then we clearly hâve the equality 
dim(Mi PI M 2 ) = m, and by Theorem 11.19, Mi and M 2 cannot belong to different 
families and DJT 2 - □ 

Corollary 11.21 If two m-dimensional isotropie subspaces intersect in a subspace 
of dimension m — 1, then they belong to different families 9JI 1 and DJT 2 - 

This follows from the fact that m and m — 1 hâve opposite parity. 

Case 2. Now we may proceed to an examination of the second case, in which the 
dimension of the space L is odd. It is considerably easier and can be reduced to the 
already considered case of even dimensionality. 

In order to retain the previous notation used in the even-dimensional case, let 
us dénoté by L the space of odd dimension 2m -b 1 under considération and let us 
embed it as a hyperplane in a space L of dimension 2m + 2. Let us dénoté by F a 
nonsingular quadratic form on L and by F its restriction to L. Our further reasoning 
will be based 011 the following fact. 
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Lemma 11.22 For every nonsingular quadratic form F there exists a hyperplane 
L C L such that the quadratic form F is nonsingular. 

P roof In a complex projective space, ail nonsingular quadratic forms are équivalent. 
And therefore, it suffices to prove the required assertion for any one form F. For F, 
let us take the nonsingular form (11.33) that we encountered previously with m 
replaced by m + 1. Thus for a vector x e L with coordinates (jci , . . . , X 2 m+ 2 ), we 
hâve 

m+ 1 

F(x) = ^2xiX m + 1+/. (11.41) 

i= 1 

Let us define a hyperplane L C L by the équation x\ — x m + 2 - The coordinates in L are 
collections (x \ , . . . , x m +\ , x m + 2 > x m + 3 , • • • , X 2 m+ 2 ), where the Symbol w indicates 
the omission of the coordinate underneath it, and the quadratic form F in these 
coordinates takes the form 


m+ 1 


F(x) — v j ~b ^ ^ 


1=2 


The matrix of the quadratic form (1 1.42) has the block form 


/I 0 
0 


0 \ 


\ 


0 


(h 


/ 


(11.42) 


where 0 is the matrix from formula (11.32). Since the déterminant \ <P\ is nonzero, 
it follows that the quadratic form (1 1.42) is nonsingular. □ 


We shall further investigate the m - dimensional subspaces M c L, isotropie with 
respect to the nonsingular quadratic form F, which is the restriction to the hyper- 
plane L of the nonsingular quadratic form F given in the surrounding space L. Since 
in the complex projective space L ail nonsingular quadratic forms are équivalent, it 
follows that ail our results will be valid for an arbitrary nonsingular quadratic form 
on L. 

Let us consider an arbitrary (m + 1) -dimensional subspace M c L, isotropie with 
respect to F, and let us set M = M D L. It is obvious that the subspace M c L is 
isotropie with respect to F. Since in the space L, the hyperplane L is defined by a 
single linear équation, it follows that either M c L (and then M = M), or dim M = 
dim M — 1 — m. But the first case is impossible, since dim M dimL= j (2 m + 1), 
and dimM = m + 1. Thus there remains the second case: dimM = m. Let us show 
that such an association with an (m + 1) -dimensional isotropie subspace M C L of 
an m -dimensional isotropie subspace M c L gives ail the subspaces M of interest to 
us and in a certain sense, it is unique. 
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Theorem 11.23 For every m-dimensional subspace McL isotropie with respect to 
F, there exists an ( m + 1 )-dimensional subspace M C L, isotropie with respect to 
F , such that M = M D L. Moreover, in each of the families and OJÎ 2 of subspace s 

isotropie with respect to F, there exists such an M, and it is unique. 

P roof Let us consider an arbitrary m-dimensional subspace McL, isotropie with 

respect to F, and let us dénoté by its orthogonal complément with respect to the 
symmetric bilinear form (p associated with the quadratic form F in the surrounding 

space L. According to our previous notation, it should hâve been denoted by M^, 
but we shall suppress the subscript, since the bilinear form cp will be always one and 
the same. From relationship (7.75), which is valid for a nondegenerate (with respect 
to the form cp) space L and an arbitrary subspace of it (p. 267), it follows that 

dim = dim L — dim M = 2m + 2 — m = m + 2. 

_i_ 

Let us dénoté by <p the restriction of the bilinear form (p to M , and by F the 
restriction of the quadratic form F to M . The forms (p and F are singular in general. 
By définition (p. 198), the radical of the bilinear form (p is equal to fl (M -1 )- 1 = 
n M. But since M is isotropie, it follows that M c M^, and therefore, the radical 
of the bilinear form cp coincides with M. By relationship (6.17) from Sect. 6.2, the 
rank of the bilinear form (p is equal to 

dim M X — dim(M ± ) ± = dim — dim M = (m + 2) — m = 2, 

and in the subspace M^, we may choose a basis e \, . . . , e m +2 such that its last m 
vectors are contained in M (that is, in the radical cp), and the restriction of cp to 
(e\, ef) has matrix (^ *). 

Thus we hâve the décomposition = (e \ , ef) ® M, where the restriction of the 
quadratic form F to (e \ , ef) in our basis has the form x\X 2 , and the restriction of F 
to M is identically equal to zéro. 

Let us set M/ = M 0 (et), i — 1,2. Then and M 2 are (m + l)-dimensional 
subspaces in L. It follows from this construction that the M, are isotropie with respect 
to the bilinear form (p. Here M/ fl L = M, since on the one hand, from considérations 
of dimensionality, M, <f_ L, and on the other hand, M c M ; and McL. We hâve thus 
constructed two isotropie subspaces M / C L such that M/ fl L = M. That they belong 
to different families 9JI, and that in neither of these families are there any other 
subspaces with these properties, follows from Corollary 1 1.21. □ 

Thus we hâve shown that there exists a bijection between the set of m- 
dimensional isotropie subspaces McL and each of the families 9JI/ of (m + 1)- 
dimensional isotropie subspaces McL. This fact is expressed by saying that m- 
dimensional subspaces McL isotropie with respect to a nonsingular quadratic form 
F form a single family. 
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Of course, our partition of the set of isotropie subspaces into families is a matter 
of convention. It is mostly a tribute to tradition originating in the spécial cases con- 
sidered in analytic geometry. However, it is possible to give a more précisé meaning 
to this partition by describing these subspaces in terms of Plücker coordinates. 

In the previous chapter, we showed that &-dimensional subspaces M of an n- 
dimensional space L are in one-to-one correspondence with the points of some pro- 
jective algebraic variety G(k,n ), called the Grassmannian. Suppose we are given 
some nonsingular quadratic form F on the space L. Let us dénoté by I(k,n) the 
subset of points of the Grassmannian G(k , n) that correspond to the k-dimensional 
isotropie subspaces. 

We shall State the following propositions without proof, since they relate not to 
linear algebra, but rather to algebraic geometry. 4 

Proposition 11.24 The set I (k, n) is a projective algebraic variety. 

In other words, this proposition asserts that the property of a subspace being 
isotropie can be described by certain homogeneous relationships among its Plücker 
coordinates. 

A projective algebraic variety X is said to be irreducible if it cannot be rep- 
resented in the form of a union X = X\ U X 2 , where Xj are projective algebraic 
varieties different from X itself. 

Suppose the space L has odd dimension n — 2m + 1 . 

Proposition 11.25 The set I (m, 2m + 1) is an irreducible projective algebraic va- 
riety. 

Now let the space L hâve even dimension n — 2m. We shall dénoté by /, (m, 2m) 
the subset of the projective algebraic variety /(m, 2m) whose points correspond to 
m - dimensional isotropie subspaces of the family 9Jt/. Theorem 11.19 and its corol- 
laries show that 

/(m, 2m) — I\(m , 2m) U / 2 (m, 2m), I\(m, 2m) fl / 2 (m, 2m) — 0. 

This suggests the idea that the projective algebraic variety /(m, 2m) is reducible. 

Proposition 11.26 The sets Ii (m, 2m), i — 1,2, are irreducible projective algebraic 
varieties. 

Finally, we hâve the following assertion, which relates to the isotropism of a 
subspace whose dimension is less than maximal. 

Proposition 11.27 For ail k < n/ 2, the projective algebraic variety I (k, n) is irre- 
ducible. 


4 The reader can find them, for example, in the book Methods of Algebraic Geometry, by Hodge 
and Pedoe (Cambridge University Press, 1994). 
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11.4 Quadrics in a Real Projective Space 


Let us consider a projective space P(L), where L is a real vector space. As before, we 
shall restrict our attention to the case of nonsingular quadrics. As we saw in Sect. 6.3 
(formula (6.28)), a nonsingular quadratic form in a real space has the canonical form 

Xq + x 2 “h • • • + x 2 — x 2 +l — • • • — x 2 — 0. (1 1.43) 


Here the index of inertia r = s + 1 will be the same in every coordinate System in 
which the quadric is given by the canonical équation. 

If we multiply équation (11.43) by — 1, we obviously do not change the quadric 
that it defines, and therefore, we may assume that s + 1 > n — s, that is, s > 
(n — l)/2. Moreover, s < n, but in the case s — n, from équation (11.43) we ob- 
tain *0 = 0, x\ =0 , . . . , x n = 0, and there is no such point in projective space. 

Thus, in contrast to a complex projective space, in a real projective space of given 
dimension n , there exists (up to a projective transformation) not one, but several 
nonsingular quadrics. However, there is only a finite number of them; they corre- 
spond to various values 5, where we may assume that 


n — 1 

< s < n — 1 . 

2 “ ” 


(11.44) 


To be sure, it is still necessary to prove that the quadrics corresponding to the various 
values of s are not projectively équivalent. But we shall consider this question (in 
an even more complex situation) in the next section. 

Thus the number of projectively inequivalent nonsingular quadrics in a real pro- 
jective space of dimension n is equal to the number of integers s satisfying inequal- 
ity (11.44). If n is odd, n — 2m + 1, then inequality (11.44) gives m < s < 2 m, and 
the number of projectively inequivalent quadrics is equal to m + 1. And if n is even, 
n = 2m, then there are m of them. In particular, for n = 2, ail nonsingular quadrics 
in the projective plane are projectively équivalent. The most typical example is the 
circle x 2 + y 2 — 1 , which is contained entirely in the affine part of X 2 7^ 0 if the équa- 
tion is written as Xq + x 2 — x 2 = 0 in homogeneous coordinates (xo * x\ : X2) (here 
inhomogeneous coordinates are expressed by the formulas x = X0/X2, J = x\/x 2 ). 

In three-dimensional projective space, there exist two types of projectively in- 
equivalent quadrics. In homogeneous coordinates (xo : x\ : x 2 1^3), one of them is 
given by the équation Xq + xf + x£ — x^ = 0. Here we always hâve X 3 ^ 0, the 
quadric lies in the affine part, and it is given in inhomogeneous coordinates (x, y, z) 
by the équation x 2 y 2 z 2 — 1, where x = X0/X3, y = X 1 /X 3 , z = X 2 /X 3 . This 
quadric is a sphere. The second type is given by the équation Xq + x 2 — x 2 — x 2 = 0. 
This is a hyperboloid of one sheet. 

Their projective inequi valence can be seen at the very least from the fact that 
not a single real line lies on the first of them (the sphere), while on the second 
(hyperboloid of one sheet), there are two families each consisting of an infinité 
number of fines, called the rectilinear génératrices. 

Of course, we can embed a real space L into a complex space L c , and similarly, 
embed P(L) into P(L C ). Therefore, everything that was said in Sect. 11.3 about 
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isotropie subspaces is applicable in our case. However, although our quadric is real, 
the isotropie subspaces obtained in this way can turn out to be complex. The single 
exception is the case in which if the number n is odd, then s — (n — l)/2, or if n is 
even, then s = n/2. 

In the fîrst instance, we may combine the coordinates into pairs (jc/ , x 5 +i+/) and 
set U[ — Xi + Xy+i+i and V[ — xi — Xy+i + /. Then taking into account the equalities 

4s+l+/ = C*-/ 3“ Tî+1 +/)(■*■/ Tç + l+/)î 

équation (1 1.43) can be written in the form 


UÇ)V 0 + U\V\ H h u s v s = 0. (11.45) 

But this is the case of the quadric (11.33), which we considered in the previous 
section. It is easy to see that the reasoning used in Sect. 11.3 gives us a description 
of the real subspaces of a quadric. 

The case s — n / 2 for even n also does not remove us from the realm of real sub- 
spaces and also leads to the case considered in the previous section. Moreover, if the 
équation of a quadric has the form (1 1.45) over an arbitrary field K of characteristic 
different from 2, then the reasoning from the previous section remains in force. 

In the general case, it is still possible to détermine the dimensions of the spaces 
contained in a quadric. For this, we may make use of considérations already used in 
the proof of the law of inertia (Theorem 6. 17 from Sect. 6.3). There we observed that 
the index of inertia (in the given case, the index of inertia of the quadratic form from 
(1 1.43), equal to s + 1) coincides with the maximal dimension of the subspaces 17 on 
which the restriction of the form is positive definite. (Let us note that this condition 
gives a géométrie characteristic of the index of inertia, that is, it dépends only on 
the set of solutions of the équation F(x) = 0, and not on the form F that defines it.) 

Indeed, let the quadric Q be given by the équation F(x) — 0. If the restric- 
tion F' of the form F to the subspace L ' is positive definite, then it is clear 
that Q fl P(L') = 0. Thus if we are dealing with a projective space P(L), where 
dimL = n + 1, then in L there exists a subspace L of dimension s + l such that the 
restriction of the form F to it is positive definite. This means that Q fl P(L) = 0 
(however, such a subspace L is also easily determined explicitly on the basis of 
équation (11.43)). If L' C L is a subspace such that P(L') C Q , then L' D L = (0). 
Hence by Corollary 3.42, we obtain the inequality dim L + dim L' < dim L = n + 1. 
Consequently, dim L' + s+ l<n + l, and this means that dim L '< n — s. Thus 
for the space P(L') belonging to the quadric given by équation (11.43), we obtain 
dimL 7 <n — s and therefore dimP(L/) <n — s — 1. 

On the other hand, it is easy to produce a subspace of dimension n — s — l actually 
belonging to the quadric (1 1.43). To this end, let us combine in pairs the unknowns 
appearing in équation (11.43) with different signs and let us equate the unknowns 
in one pair, for example xq = x s +\, and so on. Since we hâve assumed that s + 1 > 
n — s, we may form n — s such pairs, and therefore, we obtain n — s linear équations. 
How many unknowns remain? Since we hâve combined 2 (n — s) unknowns into 
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pairs, and in ail there were n + 1 of them, there remain n + 1 — 2{n — s) unknowns 
(it is possible that this number will be equal to zéro). Thus we obtain 

(n — s) + n + 1 — 2 (n — s) — n-\- \ — {n — s) 

linear équations in coordinates in the space L. Since different unknowns occur in 
ail these équations, these équations are linearly independent and détermine in L a 
subspace L' of dimension n — s. Then dimP(L') = n — s — 1. Of course, since L' is 
contained in 2, an arbitrary subspace P(L") c P(L') for L" C L' is also contained 
in Q. Thus in the quadric Q are contained subspaces of ail dimensions r < n — s — 1. 

We hâve therefore proved the following resuit. 

Theorem 11.28 If a nonsingular quadric Q in a real projective space of dimension 
n is given by the équation F(x o, . . . , x n ) = 0 and the index ofinertia ofthe quadratic 
form F is equal to s -\- 1, then in Q are contained projective subspaces only of 
dimension r < n — s — 1, and for each such number r there can be found in Q a 
projective subspace of dimension r (yvhen s + 1 > n — r, which is always possible 
to attain without changing the quadric Q , but changing only the quadratic form F 
that détermines it to —F). 

We hâve already considered an example of a quadric in real three-dimensional 
projective space (n — 3). Let us note that in this space there are only two nonempty 
quadrics: for s = 1 and s = 2. 

For s = 2, équation (1 1.43) can be written in the form 

x o ~\~ x 2 =*3. (11.46) 

As we hâve already said, for points of a real quadric, we hâve *3 7^ 0. This means 
that our quadric is entirely contained in this affine subset. Setting x = xo/^3, y = 
x\ /X3, z = x 2/-V3, we shall write its équation in the form 


x 2 + y 2 + z 2 = 1. 

This is the familiar two-dimensional sphere S 2 in three-dimensional Euclidean 
space. Let us discover what lines lie on it. Of course, no real line can lie on a sphere, 
since every line has points that are arbitrarily distant from the center of the sphere, 
while for ail points of the sphere, their distance from the center of the sphere is equal 
to 1. Therefore, we can be talking only about complex lines of the space P(L C ). If 
in équation (1 1.46) we make the substitution X 2 = iy, where i is the imaginary unit, 
we obtain the équation jcq + x 2 — y 2 — x 2 — 0, which in the new coordinates 


uo = xo + y, V() = xo — y, u [=X[-\-x^, v\=x\—x?, 


takes the form 


uqvq 4- u\v\ = 0. 


(11.47) 
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Fig. 11.3 Hyperboloid of 
one sheet 



We studied such an équation in Sect. 11.2 (see Example 11.11). As an example 
of a line lying in the given quadric, we may take the line given by équations (1 1.25): 
uq = àmi, uo = — with arbitrary complex number À ^ 0 and arbitrary u\, v\. 
In general, such a line contains not a single real point of our quadric (that is, points 
corresponding to real values of the coordinates xo, . . . , x^). Indeed, if the number À 
is not real, then the equality «o = Xu\ contradicts the fact that uq and u\ are real. 
The case uo = u\ =0 would correspond to a point with coordinates x\ = x?, = 0, 
for which x^ + x? = 0, that is, ail je,- are equal to zéro. 

Thus on the sphere lies a set of complex lines containing not a single real point. 
If desired, ail of them could be described by formulas (11.27) and (11.28) after 
changes in coordinates that we described earlier. However, of greater interest are 
the complex lines lying on the sphere and containing at least one real point. For 
each such line / containing a real point of the sphere P , the complex conjugale line 
7 (that is, consisting of points Q , where Q takes values on the line /) also lies on 
the sphere and contains the point P. But by Theorem 11.19, through every point 
P pass exactly two lines (even if complex). We see that through every point of the 
sphere there pass exactly two complex lines, which are the complex conjugates of 
each other. 

Finally, the case s = l leads to the équation 

x o + x i ~ x 2 ~ x 3 ~ (11.48) 

which after a change of coordinates 

UQ=X0 + X\, V() =XQ- X\, U \=X2 + X3, V\=X2~X3, 

also assumes the form (11 .47). For this équation, we hâve described ail the lines con- 
tained in a quadric by formulas (1 1.27) and (1 1.28), where clearly, real values must 
be assigned to the parameters a, b, c,d in these formulas. In this case, the obtained 
quadric is a hyperboloid of one sheet, and the lines are its rectilinear génératrices. 
See Fig. 11.3. 

Let us visualize what this surface looks like; that is, let us find a more familiar 
set that is homeomorphic to this surface. To this end, let us choose one line in each 
family of rectilinear génératrices: in the first, /o; in the second, l\. As we saw in 
Sect. 9.4, every projective line is homeomorphic to the circle S 1 . On the other hand, 
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Fig. 11.4 A torus 



every line in the second family of génératrices is uniquely determined by its point of 
intersection with the line /o, and similarly, every line of the first family is determined 
by its point of intersection with the line l \ . Finally, through every point of the surface 
pass exactly two lines: one from the first family of génératrices, and the other from 
the second. 

Thus is established a bijection between the points of a quadric given by équation 
(11.48) and pairs of points (x, y), where x g lo, y e l\, that is, the set S 1 x S [ . 
It is easily ascertained that this bijection is a homeomorphism. The set S { x S 1 is 
called a torus. It is most simply represented as the surface obtained by rotating a 
circle about an axis lying in the same plane as the circle but not intersecting it. See 
Fig. 1 1.4. Such a surface looks like the surface of a bagel. As a resuit, we obtain that 
the quadric given by équation (11.48) in three-dimensional real projective space is 
homeomorphic to a torus. See Fig. 1 1.4. 


11.5 Quadrics in a Real Affine Space 

Now we proceed to the study of quadrics in a real affine space (F, L). Let us choose 
in this space a frame of reference (O; e \ , . . . , e n ). Then every point A g F is given 
by its coordinates (x \ , . . . , x n ). A quadric is the set of ail points A e V such that 

F(x \, . . . , x n ) = 0, (11.49) 

where F is some second-degree polynomial. There is now no reason to consider the 
polynomial F to be homogeneous (as was the case in a projective space). 

Collecting in F (x) terms of the second, first, and zeroth degrees, we shall write 
them in the form 


F(x) = 1r(x) + /(x) + c, (11.50) 

where is a quadratic form, /(x) is a linear form, and c is a scalar. The quadrics 
F (x) = 0 thus obtained for n — 2 and 3 represent the curves and surfaces of order 
two studied in courses in analytic geometry. 

Let us note that according to our définition of a quadric as a set of points satisfy- 
ing relationship (1 1.49), we obtain even in the simplest cases, n — 2 and 3, sets that 
generally do not belong to curves or surfaces of degree two. The same “strange” 
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examples show that dissimilar-looking second-degree polynomials can define one 
and the same quadric, that is, the solution set of équation (1 1.49). 

For example, in real three-dimensional space with coordinates x, y, z, the équa- 
tion x 2 + y 2 + z 2 + c = 0 has no solution in x, y, z if c > 0, and therefore for any 
c > 0, it defines the empty set. Another example is the équation x 2 + y 2 = 0, which 
is satisfied only with x — y — 0 but for ail z, that is, this équation defines a line, 
namely the z-axis. But the same line (z-axis) is defined, for example, by the équa- 
tion ax 2 + by 2 — 0 with any numbers a and b of the same sign. 

Let us prove that if we exclude such “pathological” cases, then every quadric is 
defined by an équation that is unique up to a nonzero constant factor. Here it will be 
convenient to consider the empty set a spécial case of an affine subspace. 

Theorem 11.29 If a quadric Q does not coïncide with a set of points of any affine 
subspace and can be given by two different équations F\(x) = 0 and Fi(pc) — 0, 
where the Fj are second-degree polynomials , then F 2 = XF\, where X is some 
nonzero real number. 

Proof Since by the given condition, the quadric Q is not empty, it must contain 
some point A. By Theorem 8.14, there exists another point B e Q such that the line 
/ passing through A and B does not lie entirely in Q. 

Let us select in the affine space V, a frame of reference (0\ e \ , . . . , e n ) in which 

the point O is equal to A and the vector e\ is equal t o AB. The line passing through 
the points A and B consists of points with coordinates (x \ , 0, . . . , 0) for ail possible 
real values x\. Let us write down the équation F/( x) = 0, i = 1,2, defining our 
quadric after arranging terms in order of the degree of x \ . As a resuit, we obtain the 
équations 

Fi{x 1 , . . . , x„) = dix\ + fi(x 2, X n )x\ + fi(x2, ■ ■ • , X n ) = 0, i = 1,2, 

where f(x 2 , . . . , x n ) and fi(x 2 , . . . , x n ) are inhomogeneous polynomials of first 
and second degree in the variables X 2 , . . . , x n . After defining f (0, . . . , 0) = fi (O) 
and fi (0, . . . , 0) = fi (O), we may say that the relationship 

a i x 2 l +f i (Ô)x l +xlr i (Ô) = 0 (11.51) 

holds for x\ = 0 (point A) and for x\ = 1 (point B), but does not hold identically 
for ail real values x \ . From this it follows that fi (O) = 0 and ai + f (O) = 0. This 
means that ai 0, for otherwise, we would obtain that relationship (11.51) was 
satisfied for ail x\. By multiplying the polynomial F x by a~\ we may assume that 

aj = 1. 

Let us dénoté by x the projection of the vector x onto the subspace (e 2 , . . . , e n ) 
parallel to the subspace (ei), that is, x = (x 2 , . . . , x n ). Then we may say that the 
two équations 

x 2 + fl + fl (x) = 0 and x 2 + f 2 (x)x\ + f 2 (x) = 0, 


(11.52) 
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where f (x) are first-degree polynomials and ifi 0c) are second-degree polynomi- 
als of the vector x , hâve identical solutions. Furthermore, we know that they both 
hâve two solutions, x\ = 0 and x\ — 1 , for x — 0 , that is, the discriminant of each 
quadratic trinomial 

Pi (x 1 ) = x\ + fi (x)xi + fi ( x ), i = 1 , 2, 

with coefficients depending on the vector x, for x = 0, is positive. 

The coefficients of the trinomial pt(x\) can be viewed as polynomials in the 
variables X 2 , ,x n , that is, the coordinates of the vector x. Consequently, the dis- 
criminant of the trinomial pi(x\) is also a polynomial in the variables X 2 , . . . , x n , 
and therefore, it dépends on them continuously. From the définition of continuity, 
it follows that there exists a number s > 0 such that the discriminant of each tri- 
nomial pi(x i) is positive for ail x such that \x 2 \ < s 9 . . . , \x n \ < e. This condition 
can be written compactly in the form of the single inequality |x| < £, assuming that 
the space of vectors x is somehow converted into a Euclidean space in which is 
defined the length of a vector |x|. For example, it can be defined by the relationship 
\x\ 2 — x\ H \-x 2 . 

Thus the quadratic trinomials pt0c\) with leading coefficient 1 and coefficients 
fi (x) and \jfi 0c) , depending continuously on x, each hâve two roots for ail |x| < s. 
But as is known from elementary algebra, such trinomials coincide. Therefore, 
fl 0c) = f20c) and x//\ (x) = ^ (x) for ail |x| < s. Hence on the basis of the fol- 
lowing lemma, we obtain that these equalities are satisfied not only for |x| < £, but 
in general for ail vectors x . □ 

Lemma 11.30 If for some number £ > 0, the polynomials /(x) and g(x) coincide 
for ail x such that |x | < s, then they coincide identically for ail x. 

P roof Let us represent each of the polynomials /(x) and g(x) as a sum of homo- 
geneous terms: 

N N 

f (x) = ^ fk (x) , g (x ) = ^ gk 0C ) . (11.53) 

k=0 k=0 

Let us set x = ay, where |ÿ| < e and the number a is in [0, 1]. Then the condition 
|x| < s is clearly satisfied, and this means that /(x) = g(x). Setting x = a~ÿ in 
equality (1 1.53), we obtain 


N N 

^2<x k f k (J) = ^2<x k g k (J)- (11.54) 

k= 0 k = 0 

On the one hand, equality (11.54) holds for ail a e [0, 1], of which there are in- 
finitely many. On the other hand, (1 1.54) represents an equality between two poly- 
nomials in the variable a. As is well known, polynomials of a single variable taking 
the same values for an infinité number of values of the variable coincide identi- 
cally, that is, they hâve the same coefficients. Therefore, we obtain the equalities 


1 1 .5 Quadrics in a Real Affine Space 


417 


fk(y) = gk(y) for ail k = 0, . . . , N and ail y for which \y\ < s. But silice the poly- 
nomials fk and gk are homogeneous, it follows that these equalities hold in general 
for ail y. 

Indeed, every vector ~ÿ can be represented in the form ~ÿ — az with some scalar 
a and vector z for which \z\ < s. For example, it suffices to set a = (2/s)\y\. 
Consequently, we obtain fk(z) = gk(X). But if we multiply both sides of this 
equality by a k and invoke the homogeneity of fk and gk, we obtain the equality 
fk(az) — gk(oiz ), that is, fk(ÿ) = gk(ÿ ), which is what was to be proved. □ 

Let us note that we might hâve posed this same question about the uniqueness 
of the correspondence between quadrics and their defining équations with regard 
to quadrics in projective space. But in projective space, the polynomial defining a 
quadric is homogeneous, and this question can be resolved even more easily. So that 
we wouldn’t hâve to repeat ourselves, we hâve considered the question in the more 
complex situation. 

Let us now investigate a question that is considered already in a course in analytic 
geometry for spaces of dimension 2 and 3: into what simplest form can équation 
(11.49) be brought by a suitable choice of frame of reference in an affine space 
of arbitrary dimension ni This question is équivalent to the following: under what 
conditions can two quadrics be transformed into each other by a nonsingular affine 
transformation? 

We shall consider quadrics in an affine space ( V , L) of dimension n, assuming 
that for smaller values of n , this problem has already been solved. In this regard, we 
shall not consider quadrics that are cylinders , that is, having the form 

Q = h~ l (Q'), 

where ( h , A) is an affine transformation of the space (V, L) into the affine space 
(V 7 , L') of dimension m < n , and Q' is some subset of V ' . Let us ascertain that in 
this case, Q' is a quadric in V' . 

Let the quadric Q in a coordinate System associated with some frame of reference 
of the affine space V be defined by the second-degree équation F (x \ , . . . , x n ) = 0. 
Let us choose in the m - dimensional affine space V' some frame of reference 
. . . , e ' m ) . Then e\, , e' m is a basis in the vector space L. In the défini- 
tion of a cylinder, one has the condition ,A(L) = L. Let us dénoté by e\, ... , e m 
vectors e, e L such that A(ei) = e'j, i = 1, . . . , m, and let us consider the subspace 
M = {e\, . . . ,e m ) that they span. By Corollary 3.31, there exists a subspace N c L 
such that L = M ® N. Let O e V be an arbitrary point such that h (O) = O' . Then 
in the coordinate System associated with the frame of reference (O'; e \, . . . , e' m ), 
the projection of the space L onto M parallel to the subspace N and the associated 
projection h of the affine space V onto V' are defined by the condition 

h(x \,...,x n ) = (x\,...,x' m ), 

where x' t are the coordinates of (O'; e \, . . . , e' m ), the associated frame of refer- 
ence. Then the fact that Q is a quadric means that its second-degree équation 
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F(x i, ... ,x tl ) = 0 is satisfied irrespective of the values that we hâve substituted 
for the variables x m +\ , . . . , x n if the point with coordinates (x\, , x m ) belongs 
to the set Q' . For ex ample, we may set x m +\ = 0, . . . , x n =0. Then the équation 
F(xj , . . . , x' , 0, . . . , 0) = 0 will be precisely the équation of the quadric Q' . 

The same reasoning shows that if a polynomial F dépends on fewer than n un- 
knowns, then the quadric Q defined by the équation F (x) — 0 is a cylinder. There- 
fore, in the sequel we shall consider only quadrics that are not cylinders. Our goal 
will be the classification of these quadrics using nonsingular affine transformations. 
Two quadrics that can be mapped one into the other by such a transformation are 
said to be affinely équivalent. 

First of ail, let us consider the effect of a translation on the équation of a quadric. 
Let the équation of the quadric Q in coordinates associated with some frame of 
reference (O; e \, . . . , e n ) hâve the form 

F(x) = x/s(x) + /(x) + c = 0, (1 1.55) 

where ÿ(x) is a quadratic form, /( x) is a linear form, and c is a number. If T a is a 
translation by the vector a e L, then the quadric T a (Q) is given by the équation 

\/f(x + a) + /(x + a) + c = 0. 

Let us consider how the équation of a quadric is transformed under these conditions. 
Let cp(x , y) be the symmetric bilinear form associated with the quadratic form \//(x), 
that is, ï/f(x) = cp(x, x). Then 

xj/( x + a) — cp(x + a, x + a) = (p(x, x) + 2cp(x, a) + (p(a, a) 

= ÿ(x) + 2<p(x, a) + f(a). 

As a resuit, we obtain that after a translation T a : 

(a) The quadratic part \//(x) does not change. 

(b) The linear part f(x) is substituted by /(x) + 2cp(x, a). 

(c) The constant term c is substituted by c + /(«) + \/f (a). 

Using statement (b), then with the aid of a translation T a , it is sometimes possible 
to eliminate the first-degree terms in the équation of a quadric. More precisely, this 
is possible if there exists a vector a e L such that 

f(x) = — 2<p(x, a) (11.56) 

for an arbitrary x G L. By Theorem 6.3, any bilinear form cp(x, y) can be repre- 
sented in the form cp(x, y) = (x, eA(y)) via some linear transformation ,^4> : L — > L*. 
Then condition (11.56) can be written in the form (x, /) = — 2(x, cA(«)) for ail 
x g L, that is, in the form / = —2 A (a) = A (—2a). This means that the condition 
(11.56) amounts to the linear function / g L* being contained in the image of the 
transformation A. 
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First of ail, let us investigate those quadrics for which condition (11.56) is satis- 
fied. In this case, there exists a frame of reference of the affine space in which the 
quadric can be represented by the équation 


F(x) — ir\x) -b c — 0. 


(11.57) 


This équation exhibits a remarkable symmetry: it is invariant under a change of the 
vector x into —x. Let us investigate this further. 

Définition 11.31 Let V be an affine space and A a point of V . A central symmetry 
with respect to a point A is a mapping V V that maps each point B e V to the 

point B' eV such that AB' = — AB . 

It is obvious that by this condition, the point B' , and therefore the mapping, 
is uniquely determined. A trivial vérification shows that this mapping is an affine 
transformation and its linear part is equal to — 8 . 

Définition 11.32 A set Q C V is said to be centrally symmetric with respect to a 
point A g V if it is invariant under a central symmetry with respect to the point A, 
which in this case is called the center of the set Q. 

It follows from the définition that a point A on a quadric is a center if and only 
if the quadric is transformed into itself by the linear transformation —S, that is, 
x i— >- — x, where x — AX for every point X of this quadric. 

Theorem 11.33 If a quadric does not coincide with an affine space , is not a cylin- 
der ; and has a center , ; then the center is unique. 

P roof Let A and B be two distinct centers of the quadric Q. This means, by défini- 
tion, that for every point X e Q, there exists a point X' e Q such that 


AX = —AX', 

and for every point Y e Q, there exists a point Y' e Q such that 


(11.58) 


BY — —BY' . 


(11.59) 


Let us apply relationship (11.58) to an arbitrary point X e Q, and relationship 
(11.59) to the associated point X' — Y . Let us dénoté the point Y' obtained as a 
resuit of these actions by X" . It is obvious that 


XX" — XA + AB + BX" , 


n 


(11.60) 


and from relationships (11.58) and (11.59), it follows that XA — AX and BX — 

— > > — > 

X' B. Substituting the last expressions into (11.60), we obtain that XX" — 2 AB. In 

other words, this means that if the vector e is equal to 2 AB, then the quadric Q is 
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Fig. 11.5 Similcir triangles 



invariant under the translation T e \ see Fig. 1 1.5. This assertion also follows from an 

examination of the similar triangles ABX' and XX" X' in Fig. 11.5. 

Since A B, the vector e is nonnull. Let us choose an arbitrary frame of ref- 

erence (O; e i, . . . , e n ), where e\ — e. Let us set L = (^ 2 , . . . , e n ) and consider 

the affine space V ' — (L, L) and mapping h : V —> V', defined by the folio w- 

— > > 

ing conditions: h(O) = (9, h(A) — O if OA — Xe, and /z(A z ) = e { if O Ai — ei 
( i = 2, . . . , n). It is obvious that the mapping h is a projection and that the set Q is a 
cylinder. Since by our assumption, the quadric Q is not a cylinder, we hâve obtained 
a contradiction. □ 

Thus we obtain that by choosing a System of coordinates with the origin at the 
center of the quadric, one can define an arbitrary quadric satisfying the conditions 
of Theorem 11.33 by the équation 

, . . . , x n ) = c, (11.61) 

where f is a nonsingular quadratic form (in the case of a singular form the 
quadric would be a cylinder). 

If c 0, then we may assume that c — 1 by multiplying both sides of equality 

(11.61) by c -1 . Finally, we may execute a linear transformation that préserves the 
origin and brings the quadratic form into canonical form (6.22). As a resuit, the 
équation of the quadric takes the form 

x\-\ fXr-^r+i x h = c . (11.62) 

where c = 0 or 1 , and the number r is the index of inertia of the quadratic form \j/ . 

If c = 0 and r = 0 or r = n, then it follows that x\ = 0, . . . , x n = 0, that is, 
the quadric consists of a single point, the origin, which contradicts the assumption 
made above that it does not coincide with some affine subspace. Likewise, for c = 
1 and r = 0, we obtain that —x\ — • • • — x„ = 1, and this is impossible for real 
x \ , . . . , x n , so that the quadric consists of the empty set, which again contradicts our 
assumption. 

We hâve thus proved the following assertion. 

Theorem 11.34 If a quadric does not coincide with an affine subspace , is not a 
cylinder ; and has a center ; then in some coorclinate System , it is defined by équation 

(1 1.62) . Moreover, 0 < r <n, and if c — 0, then r < n. 

In the case c = 0, it is possible, by multiplying the équation of a quadric by — 1, 
to obtain that in (1 1.62), the number of positive terms is not less than the number of 
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négative terms, that is, r > n — r, or equivalently, r >n/ 2. In the sequel, we shall 
always assume that in the case c — 0, this condition is satisfied. 

Theorem 1 1.34 asserts that every quadric that is not an affine subspace or a cylin- 
der and that has a center can be transformed with the help of a suitable nonsingular 
affine transformation into a quadric given by équation (1 1.62). For c — 0 (and only 
in this case), the quadric (1 1.62) is a cône (with its vertex at the origin), that is, for 
every one of its points x , it also contains the entire line (x ) . It is possible to indicate 
another characteristic property of a quadric given by équation (11.62) for c = 0: it 
is not smooth, while in the case c — 1 , the quadric is smooth. This follows at once 
from the définition of singular points (the equalities F — 0 and = 0). 

Let us now consider quadrics without a center. Such a quadric Q is defined by 
the équation 

F(x) = xjr (x) + f(x) + c = 0, (1 1.63) 

where xfs(x) is a quadratic form, f(x) a linear form, c a scalar. As earlier, we shall 
write a symmetric bilinear form cp(x, y) corresponding to a quadratic form i J/(x) 
as cp(x, y) — (x, ^(jO), where A : L — > L* is a linear transformation. We hâve seen 
that for a quadric Q not to hâve a center is équivalent to the condition / ^ ^(L). 

Let us choose an arbitrary basis e \ , . . . , e n -\ in the hyperplane L = (f) a defined 
in the space L by the linear homogeneous équation f (x) = 0, and let us extend this 
basis to a basis of the entire space L by means of a vector e n _L L such that f (e n ) = 1 
(here, of course, orthogonality is understood in the sense of being with respect to the 
bilinear form cp(x, y)). In the obtained frame of reference (O; e \, . . . , e n ), équation 
(1 1.63) can be written in the form 

F(x) = t/r'(xi, . . . ,x n -i) + ax~ +x n + c = 0, (11.64) 

where xp' is the restriction of the quadratic form xp to the hyperplane L'. 

Let us now choose in L a new basis e \ , . . . , e' n _ l , in which the quadratic form 
x//' has the canonical form 

. ..,x H -\) =x\ ^ f x}. - x1 + j xl_ v (11.65) 

It is obvious that in this case, the coordinate origin O and the vector e n remain 
unchanged. If as a resuit, the quadratic form xj/' turned out to dépend on fewer than 
n — 1 variables, then the polynomial F in équation (11.63) would dépend on fewer 
than n variables, and that, as we hâve seen, means that the quadric Q is a cylinder. 

Let us show that in formula (1 1.64), the number a is equal to 0. If a ^ 0, then by 
virtue of the obvious relationship ax% -\-x n + c = a(x n + P) 2 + c\ where p = 1 /(2a) 
and c' — c — p/ 2, we obtain that via the translation T a by the vector a — —Pe n , 
équation (1 1.64) is transformed into 

F(x) = \j/’(x i, . . . , x„ — i ) + axl + c' = 0, 

where xp' has the form (11.65). But such an équation, as is easily seen, gives a 
quadric with a center. 

Thus assuming that the quadric Q is not a cylinder and does not hâve a center, 
we obtain that its équation has the form 
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2 i | 2 2 2 i i r\ 

X\ + • • • + X r — X r _|_ J — ... — X n _Y H- x n C — ü. 

Now let us perform a translation T a by the vector a — —ce n . As a resuit, the co- 
ordinates x \, . . . ,x n -\ are unchanged, while x n is changed to x n — c. In the new 
coordinates, the équation of the quadric assumes the form 

XjH hi r 2 -x r 2 +1 x n-i + x n = 0. (11.66) 

By multiplying the équation of the quadric by — 1 and changing the coordinate x n 
to — x n , we can obtain that the number of positive squares in équation (11.66) is 
not less than the number of négative squares, that is, r > n — r — 1, or equivalently, 
r > ( n — l)/2. 

We hâve thereby obtained the following resuit. 

Theorem 11.35 Every quadric that is not an affine subspace or a cylinder and does 
not hâve a center can be given in some coordinate System by équation (11 .66), where 
r is a number satisfying the condition {n — l)/2 < r < n — 1. 

Thus by combining Theorems 11.34 and 11.35, we obtain the following resuit: 
Every quadric that is not an affine subspace or a cylinder can be given in some 
coordinate System by équation (11.62) if it doesn't hâve a center and by équation 

(1 1.66) if it does hâve a center. We call these équations canonical. 

Theorems 11.34 and 11.35 do more than give the simplest form into which the 
équation of a quadric can be transformed through a suitable choice of coordinate 
System. Beyond that, it follows from these theorems that quadrics having a canonical 
form (11.62) or (11.66) can be affinely équivalent (that is, transformable into each 
other by a nonsingular affine transformation) only if their équations coincide. 

On the way to proving this assertion, we shall first establish that quadrics defined 
by équation (11.66) never hâve a center. Indeed, writing the équation of a quadric 
in the form (1 1.50), we may say that it has a center only if / g A>(L). But a simple 
vérification shows that this condition is not satisfied for quadrics defined by équation 

(1 1.66) . Indeed, if in some basis e \ , . . . , e n of the space L, the quadratic form ÿ(x) 
is given as 

r 2 | | 2 2 2 
x \ H rl f -X r+x X n _i, 

then on choosing the dual basis / 1 , of the dual space L* , we obtain 

that the linear transformation : L — > L* associated with by the relationship 
cp(x, y) = (x, <A(y)), in which (p(x, y) is a symmetric bilinear form determined by 
the quadratic form if, has the form A(ei) = f t for i = 1, . . . , r, A (et) = —fi for 
i = r+ l,...,n— 1, and A(e n ) — 0, and the linear form x n coincides with f n . Thus 
^(L) = </ 1 , • • • , /„_!> and f = f n <£ <A(L). 

We may now formulate the fundamental theorem on the classification of quadrics 
with respect to nonsingular affine transformations. 

Theorem 11.36 Any quadric that is not an affine subspace or cylinder can be rep- 
resented in some coordinate System by the canonical équation (11.62) or (11.66), 
where the number r scitisfies the conditions indicated in Theorems 1 1.34 and 11.35 
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respectively. And conversely , every pair of quadrics having the canonical équation 
(11.62) or (11.66) in some coordinate Systems can be transformed into each other 
by a nonsingular affine transformation only iftheir canonical équations coincide. 

Proof Only the second part of the theorem remains to be proved. We hâve already 
seen that quadrics given by équations (11.62) and (11.66) cannot be mapped into 
each other by nonsingular affine transformations, since in the first case, the quadric 
has a center, while in the second case, it does not. Therefore, we may consider each 
case separately. 

Let us begin with the first case. Let there be given two quadrics Q\ and Q 2 , 
given by different canonical équations of the form (1 1.62) (we note that the canon- 
ical équations in this case differ by the value c — 0 or 1 and index r), and where 
Q 2 — g(2i), with ( g , e>4>) a nonsingular affine transformation. By assumption, each 
quadric has a unique center, which in its chosen coordinate System coincides with 
the point O = (0, . . . , 0). 

Let us write down the transformation g in the form (8.19): g = T a gç), where 
go {O) — O. By assumption, Q 2 = g(Qi), and this means that g (O) = O, that is, 
the vector a is equal to 0. In the équations of the quadrics, which we may write in 
the form F; (x) = t/t,-(x) + q = 0, i = 1 and 2, it is clear that F/ (0) = q, and this 
means that the constants c/ coincide (in the sequel, we shall dénoté them by c). Thus 
the équations of the quadrics Q\ and Q 2 differ only in the quadratic part ^/(x). 

By Theorem 11.29, the transformation g takes the polynomial Fi(x) — c into 
À(F 2 (x) — c), where À is some nonzero real number. Consequently, the quadratic 
form is transformed into À^ 2 (x) by the linear transformation A. If we de- 

note the indices of inertia of the quadratic forms ifi (x) by r/, then from the law of 
inertia, it follows that either r 2 = r\ (for À > 0) or r 2 = n — ri (for À < 0). In the 
case c — 0, we may assume that ri > n/ 2, and the equality r 2 — n — r\ is possible 
only for r 2 = r\. In the case c — 1, this same resuit follows from the fact that the 
transformation A takes the polynomial T/q(x) — 1 into — 1). Comparing 

the constant terms, we obtain À = 1. 

In the case that the quadric has no center, we may repeat the same arguments. We 
again obtain that the quadratic form \(x ) is carried into Xf/ 2 (x) by a nonsingular 
linear transformation. Since each form V'ï(x) contains by assumption the term xf, 
it follows that À = 1, and from the law of inertia, it follows that r 2 = r\ (for À > 0), 
or r 2 = n — 1 — ri (for k < 0). Since by assumption, ri > (n — l)/2, the equality 
r 2 = n — 1 — r\ is possible only for r 2 = r \ . □ 

Thus we see that in a real affine space of dimension n , there exists only a finite 
number of affinely inequivalent quadrics that are not affine subspaces or cylinders. 
Each of them is équivalent to a quadric that can be represented in the form of équa- 
tion (11.62) or équation (11.66). 

It is possible to compute the number of types of affinely inequivalent quadrics. 
Equation (1 1.62) for c — 1 gives n possibilities. The remaining cases dépend on the 
parity of the number n. If n — 2m, then équation (1 1.62) for c — 0 gives m different 
types, and the same number is given by équation (11.66). Altogether, we obtain 
n + 2m = 2 n different types in the case of even n. If n = 2m -b 1, then équation 
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(1 1.62) for c — 0 gives m different types, and the same number is given by équation 
(1 1.66). Altogether in this case we obtain n + 2m — 1 = 2n — 2 different types. Thus 
in a real affine space of dimension n , the number of types of affinely inequivalent 
quadrics that are not affine subspaces or cylinders is equal to 2 n if n is even, and to 
2n — 2 if n is odd. 

Remark 11.37 It is easy to see that the content of this section is reduced to the clas- 
sification of second-degree polynomials F(x\, ...,x n ) up to a nonsingular affine 
transformation of the variables and multiplication by a nonzero scalar coefficient. 
The connection with the géométrie object — the quadric — is established by Theo- 
rem 11.29. That we excluded from considération the case of affine subspaces is 
related to the fact that we wished to emphasize the différences among the géométrie 
objects that arise. 

The assumption that the quadric was not a cylinder was made exclusively to 
emphasize the inductive nature of the classification. The limitations that we intro- 
duced could hâve been done without. B y repeating precisely the same arguments, 
we obtain that an arbitrary set in /i-dimensional affine space given by equating a 
second-degree polynomial in n variables — the coordinates of a point — to zéro is 
affinely équivalent to one of the sets defined by the following équations: 

x\ H h x 2 — x 2 +1 x m~ 1, 0 <r<m<n, (1 1.67) 

9 9 9 9 ni 

xf H -I r+1 X^ = 0, r > — , m < n, (11.68) 

o 9 9 9 /Il 1 

x { H \-x r ~x r+x x m _ x + x m = 0, r > — - — , m < n. (11.69) 

After this, it is easy to see that in the case of (1 1.67) for r = 0, the empty set is ob- 
tained, while in the case (1 1.68) for r = 0 or r = m, the resuit is an affine subspace. 
In the remaining cases, it is easy to find a line that intersects the given set in two 
distinct points and is not entirely contained in it. By virtue of Theorem 8.14, this 
means that such a set is not an affine subspace. 

In conclusion, let us say a bit about the topological properties of affine quadrics. 

If in équation (1 1.62), we hâve c — 1 and the index of inertia r is equal to 1, then 
this équation can be rewritten in the form x x — 1 + x% + • • • + x 2 , from which it 
follows that 4> 1, that is, x\ > 1 or x\ < — 1. Clearly, it is impossible for a point 
of the quadric whose coordinate x\ is greater than 1 to be continuously deformed 
into a point whose coordinate x\ is less than or equal to — 1 while remaining on the 
quadric (see the définition on p. xx). Therefore, a quadric in this case consists of two 
components , that is, it consists of two subsets such that no two points lying one in 
each of these subsets can be continuously deformed into each other while remaining 
on the quadric. It can be shown that each of these components is pat h connected (see 
the définition on p. xx), just as is every quadric given by équation (1 1.66). 

The simplest example of a quadric consisting of two path-connected components 
is a hyperbola in the plane; see Fig. 1 1.6. 
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Fig. 11.6 A hyperbola 



The topological property that we described above has a generalization to quadrics 
defined by équation (11.62) for c — 1 with smaller values of the index r, but still 
assuming that r > 1 . Here we shall say a few words about them, without giving a 
rigorous formulation and also omitting proofs. 

For r — 1 we can find two points, (1, 0, . . . , 0) and (—1,0, . . . , 0), that cannot be 
transformed into each other by a continuous motion along the quadric (they could 
be given as the sphere x\ — 1 in one-dimensional space). For an arbitrary value of 
r, the quadric contains the sphere 

Xj+-*-+X^=l, %r-\-\ — 0, ..., X n — 0. 

One can prove that this sphere cannot be contracted to a single point by continu- 
ous motion along the surface of the quadric. But for every m < r and continuous 
mapping / of the sphere S m ~ [ : -b • • • -b y} n — 1 into the quadric, the image of 

the sphere can be contracted to a point by continuous motion along the 

quadric (it should be clear to the reader what is meant by continuous motion of a set 
along a quadric, something that we hâve already encountered in the case r — 1). 


11.6 Quadrics in an Affine Euclidean Space 

It remains to us to consider nonsingular quadrics in an affine Euclidean space V . 
We shall, as before, exclude the cases in which the quadrics are affine subspaces 
or cylinders. The classification of such quadrics up to metric équivalence uses pre- 
cisely the same arguments as those used in Sect. 11.5. To some extent, the results 
of that section can be applied in our case, since motions are affine transformations. 
Therefore, we shall only cursorily recall the line of reasoning. 

Generalizing the statement of the problem, which goes back to analytic geometry 
(where cases dim V — 2 and 3 are considered), we shall say that two quadrics are 
metrically équivalent if they can be transformed into each other by some motion 
of the space V . This définition is a spécial case of metric équivalence of arbitrary 
metric spaces (see p. xxi), to which belong, as is easily verified, ail quadrics in an 
affine Euclidean space. 

First of ail, let us consider quadrics given by équations whose linear part can be 
annihilated by a translation. These are quadrics that hâve a center (which, as we 
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hâve seen, is unique). Choosing a coordinate origin (that is, a point O of the frame 
of reference (O; e \ , . . . , e n )) in the center of the quadric, we bring its équation into 
the form 


ijr(xi,...,x n ) = c, 

where xj/(x i , . . . , x n ) is a nonsingular quadratic form, c a number. If c ^ 0, then by 
multiplying the équation by c -1 , we may assume that c — 1. For c = 0, the quadric 
is a cône. 

Using an orthogonal transformation, the quadratic form can be brought into 
canonical form 


x//(x 1 , . . . , X n ) — k[X^ + À2^2 "F * ' * H - k n X^, 

where ail the numbers k \, . . . , k n are nonzero, since by assumption, our quadric 
is nonsingular and is neither an affine subspace nor a cylinder, which means that 
the quadratic form x/r is nonsingular. Let us separate the positive numbers from the 
négative: suppose k \, . . . , k k > 0 and À*+i, . . . , k n < 0. By tradition going back 
to analytic geometry, we shall set À/ = a j for i = 1, . . . , k and kj — —a ■ for 
j = k + l, . . . ,n, where ail numbers a \ , . . . , a n are positive. 

Thus every quadric having a center is metrically équivalent to a quadric with 
équation 


ai J \akj \a k + [J 

where c = 0 or 1. For c = 0, multiplying équation (1 1.70) by — 1, we may, as in the 
affine case, assume that k>n /2. 

Now let us consider the case that the quadric 

lAOt, • • • , x n ) + f(x i, ...,x n ) + c = 0 

does not hâve a center, that is, / ^ <A(L), where A : L — >• L* is the linear transforma- 
tion associated with the quadratic form \j/ by the relationship <p(x, y) — (x, ^(y)), 
in which (p(x, y) is the symmetric bilinear form that gives the quadratic form xjr. In 
this case, it is easy to verify that as in Sect. 1 1.5, we can find an orthonormal basis 
e i , . . . , e n of the space L such that 

fie 0 = 0, ..., f(e n -\) = 0, f(e n ) = 1, 

and in the coordinate System determined by the frame of reference (O; e \ , . . . , e n ), 
the quadric is given by the équation 

k[X^ + A. 2 X 2 H h k n — \X^ l _\ + x n + C = 0. 

Through a translation by the vector — ce n , this équation can be brought into the form 

k[xf -|- 7.2^2 H- • • • H- k n —\Xn_i + x n — 0, 


x 


n 


= C, 


a 


n 


(11.70) 
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in which ail the coefficients À, are nonzero, since the quadric is nonsingular and is 
not a cylinder. 

If X \, . . . , Xk >0 and . . . ,X n -\ <0. then by multiplying the équation of 
the quadric and the coordinate x n by — 1 if necessary, we may assume that k > 
(n — l)/2. Setting, as previously, À/ = a i for i = 1, . . . , k and À y = —a ■ for 
j — k + 1 , k + 2, . . . , n — 1, where a \ , . . . , a n > 0, we bring the previous équation 
into the form 


Thus every quadric in an affine Euclidean space is metrically équivalent to a 


the given conditions and restriction on r) that two quadrics of the form (11.70) or 
of the form (1 1.71) are metrically équivalent only if ail the numbers a \ , . . . , a n (for 
type I) and a \, . . . , a n -\ (for type II) in their équations are the same. Here we may 
consider separately quadrics of type I and of type II, since they differ even from the 
viewpoint of affine équivalence. 

By Theorem 8.39, every motion of an affine Euclidean space is the composi- 
tion of a translation and an orthogonal transformation. As we saw in Sect. 11.5, a 
translation does not al ter the quadratic part of the équation of a quadric. B y Theo- 
rem 1 1.29, two quadrics are affinely équivalent only if the polynomials appearing in 
their équations differ by a constant factor. But for quadrics of type I for c — 1, this 
factor must be equal to 1. In the case of a quadric of type I for c — 0, multiplication 
by /i > 0 means that ail the numbers ai are multiplied by /x _1//2 . For a quadric of 
type II, this factor must also be equal to 1 in order to preserve the coefficient 1 in 
the linear term x n . 

Thus we see that if we exclude quadrics of type I with constant term c — 0 
(a cône), then the quadratic parts of the équations must be quadratic forms équiva- 
lent with respect to orthogonal transformations. But the numbers À, are defined as 
the eigenvalues of the associated linearly symmetric transformation, and therefore, 
this also détermines the numbers a/ . In the case of a cône (quadric of type I for 
c — 0), ail the numbers À/ can be multiplied by a common factor that is a positive 
number (because of the assumptions made about r). This means that the numbers ai 
can be multiplied by an arbitrary positive common factor. 

Let us note that although our line of reasoning was precisely the same as in the 
case of affine équivalence, the resuit that we obtained was different. We obtained 
relative to affine équivalence only a finite number of different types of inequivalent 
quadrics, while with respect to metric équivalence, the number is infinité: they are 
determined not only by a finite number of values of the index r, but also by arbi- 
trary numbers ai (which in the case of a cône are defined up to multiplication by a 
common positive factor). This fact is presented in a course in analytic geometry; for 
example, an ellipse with équation 




(11.71) 


quadric given by équation (1 1.70) (type I) or (1 1.71) (type II). Let us verify (under 
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is defined by its semiaxes a and b , and if for two ellipses these are different, then 
the ellipses cannot be transformed into each other by a motion of the plane. 

For arbitrary n , quadrics having a canonical équation (11.70) with k — n and 
c — 1 are called ellipsoids. The équation of an ellipsoid can be rewritten in the form 



(11.72) 


from which it follows that \xj/cij\ < 1 and hence |jc/| < a x . If the largest of these 
numbers a \, . . . , a n is denoted by a , then we obtain that |x;| < a. This property is 
expressed by saying that the ellipsoid is a bounded set. The interested reader can 
easily prove that among ail quadrics, only ellipsoids hâve this property. 

If we renumber the coordinates in such a way that in the équation of the ellipsoid 
(1 1.72), the coefficients are a\ > a ,2 > • • • > a n , then we obtain 



whence for every point x = (x \ , . . . , x n ) lying on the ellipsoid, we hâve the inequal- 
ity a n < |x| < a\. This means that the distance from the center O of the ellipsoid 
to the point x is not greater than to the point A = (a \ , 0, . . . , 0) and not less than to 
the point B = (0, . . . , 0, a n ). These two points, or more precisely, the segments OA 
and O B, are called the semimajor and semiminor axes of the ellipsoid. 


11.7 Quadrics in the Real Plane* 

In this section, we shall not be proving any new facts. Rather, our goal is to estab- 
lish a connection between results obtained earlier with facts familiar from analytic 
geometry, in particular, the interprétation of quadrics in the real plane as conic sec- 
tions, which was known already to the ancient Greeks. 

Let us begin by considering the simplest example, in which it is possible to see 
the différence between the affine and projective classifications of quadrics, that is, 
quadrics in the real affine and real projective planes. But for this, we must first refine 
(or recall) the statement of the problem. 

By the définition from Sect. 9.1, we may represent a projective space of arbitrary 
dimension n in the form P(L), where L is a vector space of dimension n + 1. An 
affine space of the same dimension n can be considered the affine part of P(L), 
determined by the condition <p^=0, where cp is some nonnull linear function on L. It 
can also be identified with the set W^, defined by the condition cp{x) — 1. This set is 
an affine subspace of L (we may view L as its own space of vectors). In the sequel, 
we shall make use of precisely this construction of an affine space. 

A quadric Q in a projective space P(L) is given by an équation F (x) = 0, where 
F is a homogeneous second-degree polynomial. In the space L, the collection of ail 
vectors for which F (x) — 0 forms a cône K. Let us recall that a cône is a set K 
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such that for every vector x e K, the entire line ( x ) containing x is also contained 
in K. A cône associated with a quadric is called a quadratic cône. From this point 
of view, the projective classification of quadrics coincides with the classification of 
quadratic cônes with respect to nonsingular linear transformations. 

Thus an affine quadric Q can be represented in the form H K using the 

previously given notation Wy and K. Quadrics Q\ C W (pi and Q 2 C W^ 2 are 
by définition affinely équivalent if there exists a nonsingular affine transformation 
Wç 1 -> W(p 2 mapping Q\ to Q 2 . This means that we hâve a nonsingular linear trans- 
formation A of the vector space L for which 

A(W (pi ) = W (p2 and A(W ipi fl K\) — W n fl K 2 , 

where K\ and K 2 are quadratic cônes associated with the quadrics Q\ and Q 2 . 

First of ail, let us examine how the mapping A acts on the set To this end, 
let us recall that in the space L* of linear functions on L there are defined dual 
transformations A* for which 


cA* (^ 9 ) (jc) = <p(eA(x)) 

for ail vectors x e L and cp e L*. In other words, this means that if A*(^) = \/r, 
then the linear function \j/{x) is equal to ^(^(x)). Since the transformation A is 
nonsingular, the dual transformation A * is also nonsingular, and therefore, there 
exists an inverse transformation (A*) -1 . By définition, (<>4>*) _1 (^)(A>(x)) = 1 if 
(p{x) — 1, that is, A takes into the set 

Since in previous sections, we considered only nonsingular projective quadrics, it 
is natural to set corresponding restrictions in the affine case as well. To this end, we 
shall use, as earlier, the représentation of affine quadrics in the form Q — fl K . 
A quadratic cône K détermines some projection to the quadric Q. It is easy to ex- 
press this correspondence in coordinates. If we choose in L a System of coordinates 
(xo, x\ , . . . , x n ), then in W XQ are defined inhomogeneous coordinates y\ , . . . , y n by 
the formula y/ = x; /xq. If the quadric Q is given by the second-degree équation 

v«) = 0, 

then the quadric Q (and cône AT) is given by the équation 

? ( x\ x n 

F(x 0 , xi, . . . , x n ) — 0, where F — x 0 f\ — — 

\ *0 *0 

Thus the projective quadric Q is uniquely defined by the affine quadric Q. 

Définition 11.38 An affine quadric Q is said to be nonsingular if the associated 
projective quadric Q is nonsingular. 

In a space of arbitrary dimension h, ail quadrics with canonical équations 
(1 1.67)— (1 1.69) for m < n are singular. Furthermore, a quadric of type (11.68) is 
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singular as well for m — n. Both these assertions can be verified directly from the 
définitions; we hâve only to designate the coordinates x\ , . . . , x n by y\ , . . . , y n , in- 
troduce homogeneous coordinates xo : x\ : • • • : x n , setting y/ = xi /xo, and multiply 
ail the équations by x ^ . It is very easy to write down the matrix of a quadratic form 
F(xo,X\, 

In particular, for n = 2, we obtain three équations: 

y\ + yl = h y\-yl = h y? + w = o. (ii.73) 

From the results of Sect. 11.5, it follows that for n — 2, every nonsingular affine 
quadric is affinely équivalent to a quadric of one (and only one) of these three types. 
The corresponding quadrics are called ellipses , hyperbolas , and parabolas. 

On the other hand, in Sect. 11.4, we saw that ail nonsingular projective quadrics 
are projectively équivalent. This resuit can serve as a graphie représentation of affine 
quadrics. As we hâve seen, every affine quadric can be represented in the form 
Q = W(p fl K , where K is some quadratic cône. It is affinely équivalent to the quadric 

A(W (p nK) = W {A * rH(p) nA(K ), 

where A is an arbitrary nonsingular linear transformation of the space L. 

Here arises the spécifie nature of the case n — 2 (dimL = 3). By what has been 
proved earlier, every cône K associated with a nonsingular quadric can be mapped 
to every other such cône by a nonsingular transformation A. In particular, we may 
assume that A(K) = Ko , where the cône Ko is given in some coordinate System 
xo,x\,X 2 of the space L by the équation x\ + x 2 — x o- This cône is obtained by 
the rotation of one of its génératrices , that is, a line lying entirely on the cône (for 
example, the line x\ = xo, X 2 = 0) about the axis xo (that is, the line x\ = X 2 = 0). In 
the cône Ko that we hâve chosen, the angle between the generatrix and the axis xo 
is equal to n/4. In other words, this means that each pôle of the cône Ko is obtained 
by a rotation of the sides of an isosceles right triangle around its bisector. 

Setting (e>4>*) _1 (^) = i/r, we obtain that an arbitrary nonsingular affine quadric 
is affinely équivalent to the quadric Wf H Ko . Here Wf is an arbitrary plane in the 
space L not passing through the vertex of the cône Ko , that is, through the point 
O — (0, 0, 0). Thus every nonsingular affine quadric is affinely équivalent to a pla- 
nar section of a right circulai' cône. This explains the terminology conic used for 
quadrics in the plane. 

It is well known from analytic geometry how the three conics that we hâve found 
(ellipses, hyperbolas, and parabolas) are obtained from a single (from the point of 
view of projective classification) curve. If we begin with équations (11.73), then the 
différence in the three types is revealed by writing these équations in homogeneous 
coordinates. Setting y\ = x\/xo and yi — -x^/xo, we obtain the équations 

2 i 2 2 2 2 2 2 r\ / 1 i ^7 /i \ 

X l +X 2 =X 0 , X { — X 2 =Xq, X l — X0X2 = U. (11.74) 

The différences among these équations can be found in the different natures of the 
sets of intersection with the infinité line l œ given by the équation xq = 0. For an 
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Fig. 11.7 Intersection of a conic with an infinité line 


ellipse, this set is empty; for a hyperbola, it consists of two points, (0:1:1) and 
(0 : 1 : — 1), and for a parabola, it consists of the single point (0 : 0 : 1) (substitution 
into équation (1 1.73) shows that the line l œ is tangent to the parabola at the point of 
intersection); see Fig. 11.7. 

We saw in Sect. 9.2 that an affine transformation coincides with a projective 
transformation that préserves the line l œ . Therefore, the type of set Q H l œ (empty 
set, two points, one point) should be the same for affinely équivalent quadrics Q. In 
our case, the actual content of what we proved in Sect. 11.4 is that the type of set 
Q D /oo détermines the quadric Q up to affine équivalence. 

But if we begin with the représentation of a conic as the intersection of the cône 
Ko with the plane Wf , then different types appear due to a different disposition of 
the plane W f with respect to the cône Ko- Let us recall that the vertex O of the cône 
Ko partitions it into two pôles. If the équation of the cône has the form xf = Xq, 
then each pôle is determined by the sign of * 0 - 

Let us dénoté by the plane parallel to Wf and passing through the point O. 
This plane is given by the équation \fi =0. If has no points of intersection with 
the cône Ko other than O , then W f intersects one of its pôles (for example, the one 
within which lie the point of intersection Wf and the axis vo). In this case, the conic 
Wf H Ko lies within one pôle and is an ellipse. 

For example, in the spécial case in which the plane Wf is orthogonal to the axis 
xo, we obtain a circle. If we move the plane Wf (for example, decrease its angle with 
the axis xo), then in its intersection with the cône Ko, an ellipse is obtained whose 
eccentricity increases as the angle is decreased; see Fig. 1 1.8(a). The limiting posi- 
tion is reached when the plane is tangent to the cône Ko on a generatrix. Then 
Wf again intersects in one pôle (the one that contains the intersection with the axis 
xo). This intersection is a parabola; see Fig. 1 1.8(b). And if the plane intersects 
Ko in two different génératrices, then Wf intersects both of its pôles (on the side of 
the plane on which is located the plane Wf parallel to it). This intersection is a 
hyperbola; see Fig. 11.8(c). 

The connection between planar quadrics and conic sections is revealed particu- 
larly clearly by the metric classification of such quadrics, which forms part of any 
sufficiently rigorous course in analytic geometry. Let us recall only the main results. 

As was done in Sect. 1 1.5, we must exclude from considération those conics that 
are cylinders and those that are unions of vector subspaces (that is, in our case, fines 
or points). Then the results obtained in Sect. 11.5 give us (in coordinates x, y) the 
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Fig. 11.8 Conic sections 


following three types of conic: 

2 2 2 2 

-2 + 4 = 1, —--^ = 1, x 2 +a 2 y = 0, (11.75) 

a- b A a z 

where a > 0 and b > 0. From the point of view of affine classification presented 
above, curves of the first type are ellipses, those of the second type are hyperbolas, 
and those of the third type are parabolas. 

Let us recall that in a course in analytic geometry, these curves are defined as 
géométrie loci of points of the plane satisfying certain conditions. Namely, an ellipse 
is the géométrie locus of points the sum of whose distances from two given points 
in the plane is constant. A hyperbola is defined analogously with sum replaced by 
différence. A parabola is the géométrie locus of points équidistant from a given point 
and a given line that does not pass through the given point. 

There is an élégant and elementary proof of the fact that ail ellipses, hyperbolas, 
and parabolas are not only affinely, but also metrically , that is, as géométrie loci of 
points, équivalent to planar sections of a right circular cône. Let us recall that by 
right circular cône we mean a cône K in three-dimensional space obtained as the 
resuit of a rotation of a line about some other line, called the axis of the cône. The 
fines forming the cône are called its génératrices ; they intersect the axis of the cône 
in one common point, called its vertex. 

In other words, this resuit means that the section of a right circular cône with a 
plane not passing through the vertex of the cône is either an ellipse, a hyperbola, or a 
parabola, and every ellipse, hyperbola, and parabola coincides with the intersection 
of a right circular cône with a suitable plane. 5 


5 The proof of this fact is due to the Franco-Belgian mathematician Germinal Pierre Dandelin. It 
can be found, for example, in A. P. Veselov and E.V. Troitsky, Lectures in Analytic Geometiy (in 
Russian); B. N. Delone and D.A. Raikov, Analytic Geometry (in Russian); P Dandelin, Mémoire 
sur l’hyperboloïde de révolution, et sur les hexagones de Pascal et de M. Brianchon; D. Hilbert 
and S. Cohn-Vossen, Geometry and the Imagination. 


Chapter 12 

Hyperbolic Geometry 


The discovery of hyperbolic (or Lobachevskian) geometry had an enormous impact 
on the development of mathematics and on how the relationship between mathemat- 
ics and the real world was understood. The discussions that swirled around the new 
geometry also seem to hâve influenced the views of many in the humanities, who, in 
this regard, unfortunately were too much taken by a literary image: the contrast be- 
tween “down-to-earth” Euclidean geometry and the “otherworldly” non-Euclidean 
geometry invented by learned mathematicians. It seemed that the différence between 
the two geometries was that in the first geometry, as was clear to everyone, parallel 
lines did not intersect, while in the second, what to normal intelligence was difficult 
of compréhension, they do intersect. However, of course, this is exactly the opposite 
of the truth: in the non-Euclidean geometry of Lobachevsky, given a point external 
to a given line, it is possible for infinitely many lines to pass through the point with- 
out intersecting the line. It is this that distinguishes Lobachevsky’s geometry from 
that of Euclid. 

Ivan Karamazov, in Dostoevsky’s novel The Brothers Karamazov , likely sowed 
confusion among those in the humanities with the following literary image: 

At the same time there were and are even now geometers and philosophers, even some of the 
most outstanding among them, who doubt that the whole universe, or, even more broadly, 
the whole of being, was created purely in accordance with Euclidean geometry; they even 
dare to dream that two parallel lines, which according to Euclid cannot possibly meet on 
earth, may perhaps meet somewhere in infinity. 

Around the time this novel was being written, Friedrich Engels wrote Anti- 
Dühring , where an even more vivid image is used: 

But in higher mathematics, another contradiction is achieved, that lines that intersect before 
our eyes, nevertheless a mere five or six centimeters from their point of intersection are to 
be considered parallel, that is, lines that cannot intersect even when extended to infinity. 

In this, the author sees the manifestation of some sort of “dialectic.” 

And even up to the présent, it is possible to encounter, in print, such literary 
images that oppose Euclidean and non-Euclidean geometries by saying that in the 
former, parallel lines do not intersect, while in the latter, they “intersect somewhere 
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or other.” Usually, by non-Euclidean geometry is meant the hyperbolic geometry of 
Lobachevsky, which is quite understandable by anyone who has passed a college 
course in some technical subject, and there are many such people today. To be sure, 
nowadays, this is presented in mathematics departments in more advanced courses 
in differential geometry. But hyperbolic geometry is so tightly linked to a first course 
in linear algebra, that it would be a pity not to say something about it here. 


12.1 Hyperbolic Space* 

In this chapter we shall be dealing exclusively with real vector spaces. 

We shall define hyperbolic space of dimension n, which we shall hereinafter 
dénoté by h n or simply L if we do not need to indicate the dimension, as a part of 
n-dimensional projective space P(L), where L is a real vector space of dimension 
n + 1. We shall dénoté the dimension of the space L by dimL. 

Let us equip L with a pseudo-Euclidean product (x,y); see Sect. 7.7. Let us 
recall that there, the quadratic form ( x 2 ) has index of inertia n, and in some basis 
e\, ...,e n +i (called orthonormal) for the vector 


x — a i e i + f oi n € n + a n +i v n + 1 , (12.1) 

it takes the form 

(x 2 )=a\-\ -f" ««“««+ 1 • (12.2) 

In the pseudo-Euclidean space L, let us consider the light cône V defined by the 
condition (x 2 ) = 0. We say that a vector a lies inside the cône V if (a 2 ) < 0 (recall 
that in Chap. 7, we called such vectors timelike). It is obvious that the same then 
holds as well for ail vectors on the line (a), since ((a a) 2 ) = a 2 (a 2 ) < 0, and we 
shall consider this space over the field of real numbers. Such lines are also said to 
lie inside the light cône V . 

Points of the projective space P(L) corresponding to lines of the space L lying in- 
side the light cône V are called points of the space L. Consequently, they correspond 
to those lines (x) of the space L that in the form (12.1) satisfy the inequality 

a 2 a 2 < a 2 _ |_j. (12.3) 

In view of condition (12.3), the set L c P(L) is contained in one affine subset 
a n +\ ^ 0 (see Sect. 9.1). Indeed, in the case a n +\ — 0, we would obtain in (12.3) the 

inequality a 2 -\ \-a 2 < 0, which is impossible in view of the fact that a \ , . . . , a n 

are real. As we did previously in Sect. 9. 1, we can identify the affine subset a n +\ ^ 0 
with the affine subspace E : a n +\ — 1 and hence view L as a part of E ; see Fig. 12.1. 

The space of vectors of the affine space E is the vector subspace Eq C L defined 
by the condition a n +\ = 0. In other words, Eq = {e \, . . . , e n ). Let us note that the 
space of vectors Eo is not simply a vector space. As a subspace of the pseudo- 
Euclidean space L, it would seem that it should also be a pseudo-Euclidean space. 
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Fig. 12.1 Model of 
hyperbolic space 



But in fact, as can be seen from formula (12.2), the inner product (x, y) makes it 
a Euclidean space, in which the vectors e \ , . . . , e n form an orthonormal basis. This 
means that E is an affine Euclidean space, and the basis e \ , . . . , e n +\ of the space L 
forms within it a frame of reference with respect to which a point of the hyperbolic 
space Le E with coordinates (yi, . . . , y n ) is characterized by the relationship 

y\ H — + y\ < yi — » i — l» • • • » n. (12.4) 

&n + 1 

This set is called the interior of the unit sphere in E and will be denoted by U. 

Let us now turn our attention to identifying the subspaces of a hyperbolic space. 
They correspond to those vector spaces L ; C L that hâve a common point with 
the interior of the light cône V, that is, they contain a timelike vector a e L'. 
The inner product (x,y) defined in L is clearly also defined for ail vectors in 
the subspace L' C L. The space L' contains the timelike vector « , and therefore, 
by Lemma 7.53, it is a pseudo-Euclidean space, and therefore, the associated hy- 
perbolic space L' c P(L') is defined. Since P(L') C P(L) is a projective subspace, 
it follows that h' C P(L). But hyperbolic space L' is defined by the condition 
(x 2 ) < 0 both in P(L) and in P(L'), and therefore, U C L. Here by définition, 
dimL' = dimP(L') = dimL' — 1. The hyperbolic space h' thus constructed is called 
a subspace in L. 

In particular, if L' is a hyperplane in L, then dimL' = dimL — 1, and then the 
subspace L' C L is called a hyperplane in L. 

In the sequel we shall require the partition of L into two parts by the hyperplane 
L'cL: 

L\L' = L + UL~, L + nL~ = 0, (12.5) 

similar to how in Sect. 3.2, the partition of the vector space L into two half-spaces 
was accomplished with the help of the hyperplane L' c L. 

The partition (12.5) of the space L cannot be accomplished by an analogous 
partition of the projective space P(L). Indeed, if we use the définition of the subsets 
L + and L~ from Sect. 3.2, then we see that for a vector x e L + , the vector ax is in 
L~ if a < 0, so that the condition x e L + does not hold for the line {x). But such a 
partition is possible for the affine Euclidean space E ; it was constructed in Sect. 8.2 
(see p. 299). 

Let us recall that the partition of the affine space E by the hyperplane E' C E 
was defined via the partition of the space of vectors Eq of the affine space E with 
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Fig. 12.2 Hyperbolic 
half-spaces 



the aid of the hyperplane E[ } c Eo corresponding to the affine hyperplane E' , that 

is, consisting of vectors A B, where A and B are ail possible points of E f . If we 
are given a partition Eo \ Eq = E J U Eq , then we must choose an arbitrary point 

O e E' and define as the collection of ail points A e E such that OAe Eq ( E~ 
is defined analogously). The sets L + and E~ thus obtained are called half-spaces , 
and they do not dépend on the choice of point O e E' . Thus we hâve parti tioned the 
set E\E' into two half-spaces: E\ E' = E + U E~ . 

Let L' be a hyperplane in the pseudo-Euclidean space L having nonempty inter- 
section with the interior of the light cône V , and let E' be the associated hyperplane 
in the affine space E , that is, E' — E H P(L/). Then E' has nonempty intersection 
with the interior of the unit sphere U, given by relationship (12.4), and for the set 
L C E, we obtain the partition (12.5), where 

l' = l nE\ i + = £ + ni, r = rnL ( 12 . 6 ) 

The sets L + and L - defined by relationships (12.6) are called half-spaces of the 
space L. 

To put it more simply, the hyperplane E' divides the interior of the sphere U C E 
identified with the space L into two parts, U + and U~ (see Fig. 12.2), which corre- 
spond to the half-spaces L + and L~ . 

Let us show that both half-spaces L + and L~ are nonempty, although Fig. 12.2 
is sufficiently convincing by itself. We give the proof for L + (for L~ , the proof is 
similar). 

Let us consider an arbitrary point O g E' D L. It corresponds to the vector a = 
a \e\ H \- a n e n -\- e n +\ with (a 2 ) < 0 (see the définition of the affine space E on 

p. 434). Let c g Eq and B g E + be points such that O B = c. Let us consider vectors 

b t — a -b te g L and points B, e E for which O B t — b t for varying values of î g R. 
Let us note that if t > 0, then B t g L + , and if here ( b j) < 0, then B t g E + D L = 
L + . As can be seen without difficulty, the scalar square (b 2 ) is a quadratic trinomial 
in î: 

(bj) = ((a + te f) = (a 2 ) + 2 t(a, c) + 1 2 (c 2 ) = P(t). (12.7) 

By our sélection, the vector c 0 belongs to the Euclidean space Eo, and there- 
fore, (c 2 ) > 0. On the other hand, by assumption, we hâve (« a 2 ) < 0. This yields that 
the discriminant of the quadratic trinomial P{t) on the right-hand side of relation- 
ship (12.7) is positive, and therefore, P(î ) has two real roots, t\ and t 2 , and from the 


12.1 Hyperbolic Space* 


437 


condition (a 2 ) < 0 it follows that they hâve different signs, that is, t\t 2 < 0. Then, 
as is easy to see, P(t) <0 for every t between the roots t\ and ti. We will choose a 
positive such number t. 

Since the hyperbolic space L can be viewed as a part of the affine space E , 
then from E we can transfer onto L the notion of line segment, the notion of lying 
between for three points on a line segment, and the notion of convexity. An easy 
vérification (analogous to what we did at the end of Sect. 8.2) shows that the subsets 
L + and L~ introduced earlier of the set L \ L/ are characterized by the property of 
convexity: if two points A, B are in L + , then ail points lying on the segment [A, B] 
are also in L + (the same clearly holds for the subset L - ). 

Let us consider linear transformations A o fa vector space L that are Lorentz 
transformations with respect to a symmetric bilinear form cp(x,y) corresponding 
to the quadratic form ( x 2 ) and the associated projective transformations P(A). The 
latter transformations obviously take the set L to itself: given that a transforma- 
tion A) is a Lorentz transformation and from the condition ( x 2 ) < 0, it follows that 
(<AO0 2 ) = (x 2 ) < 0. The transformations of the set L that arise in this way are 
called motions of the hyperbolic space L. 

Thus motions of the space L are projective transformations of the projective 
space P(L) containing L and taking the quadratic form (x 2 ) into itself. By what 
we hâve said thus far, the définition of the interior of the light cône V can be written 
in homogeneous coordinates in the form 

x 2 + • • • + x 2 — x 2 _|_j < 0, (12.8) 

and in inhomogeneous coordinates y/ =Xi/x n +\ in the form 

y? + ■■■ + ?»< i- (12.9) 

We consider motions of a hyperbolic space as transformations of the set L, that is, 
as transformations taking the interior of the unit sphere given by condition (12.9) 
into itself. 

Let us write down some simple properties of motions: 

Property 12.1 The sequential application (composition) of two motions f\ and fc 
(as transformations of the set L) is again a motion. 

This follows at once from the fact that the composition of nonsingular transfor- 
mations Ai and A 2 is a nonsingular transformation, and this holds as well for the 
corresponding projective transformations P(Ai) and P(A 2 ). Moreover, if Ai and 
A 2 are Lorentz transformations with respect to the bilinear form (p(x, y), then the 
resuit of their composition has the same property. 

Property 12.2 A motion is a bijection of L to itself. 

This assertion follows from the fact that the corresponding transformations A : 
L — > L and P(A) : P(L) -> P(L) are bijections. But by the définition of a hyperbolic 
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space, it is also necessary to verify that every line contained in the interior of the light 
cône V is the image of a similar such line. If we hâve the line (a) with a timelike 
vector a , then we know already that there exists a vector b such that A (b) — a. 
Since A is a Lorentz transformation of a pseudo-Euclidean space L, we hâve the 
relationship (b 2 ) — (A (b) 2 ) — (a 2 ) < 0, from which it follows that the vector b is 
also timelike. Thus the transformation A takes the line (b) lying inside V into the 
line ( a ), also inside V . 

Property 12.3 Like every bijection, a motion / has an inverse transformation f ~ { . 
It is also a motion. 

The vérification of this property is trivial. 

At first glance, it is not obvious that there are “sufficiently many” motions of a 
hyperbolic space. We shall establish this a bit later, but for now, we shall point out 
some important types of motions. 

A transformation g is of type (a) if g = P («A), where A is a Lorentz transforma- 
tion of the space L such that A(e n + 1 ) = e n +\. 

Since the basis e \ , . . . , e n +\ of the pseudo-Euclidean space L is orthonormal, we 
hâve the décomposition 

L= (e„+i> ® (Cn+i)- 1 -, (tfn+i}" 1 = («i e n ), (12.10) 

and ail transformations A : L — >• L with the indicated property take the subspace 
Eo = {e \, . . . , e n ) into itself. 

Conversely, if we define A : L —> L as an orthogonal transformation of the Eu- 
clidean subspace Eo and set A(e n +\) = e n +\, then F (A) will of course be a mo- 
tion of the hyperbolic space. In other words, these transformations can be described 
as orthogonal transformations of inhomogeneous coordinates. Ail thus constructed 
motions of the space L hâve the fixed point O corresponding to the line (e n +i) 
in L, or in other words, the point O = (0, . . . , 0) in the inhomogeneous System of 
coordinates (yi, . . . , y n )- 

From the point of view of hyperbolic space, the constructed motions precisely co- 
incide with those motions that leave the point O eh fixed. Indeed, as we hâve seen, 
the point O corresponds to the line (e n +\), and the motion g is equal to P(<A), where 
c>4> is a Lorentz transformation of the space L. The condition g (O) = O means that 
e A((^„ + i)) = (e n +i), that is, A(e n + 1 ) = Xe n +\. From the fact that A is a Lorentz 
transformation, it follows that X = d=l. By multiplying A by ±1, which obviously 
does not change the transformation g = P(eA), we can obtain that the conditions 
A(e n + 1 ) = e n +\ are satisfied, whence by définition, it follows that g is a transfor- 
mation of type (a). 

Type (b) is connected with a certain line Li C L of a hyperbolic space. By défini- 
tion, the line Li is determined by the plane L' C L, dim L' = 2. Since by assumption, 
the plane L' must contain at least one timelike vector x, it follows by Lemma 7.53 
(p. 271) that it is a pseudo-Euclidean space. From formula (6.28) and Theorem 6.17 
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(law of inertia), it follows that ail such spaces of a given dimension are isomor- 
phic. Therefore, we can choose a basis in G with any convenient Gram matrix, pro- 
vided only that it defines a pseudo-Euclidean plane. We hâve seen (in Example 7.49, 
p. 269) that it is convenient to choose as such a basis the lightlike vectors / 1? f 2 , 
for which 

(/?) = (/!)=0, (/i./ 2 )=2> 

and this means that for every vector x = xfi + y/ 2 , its scalar square ( x 2 ) is equal 
to xy. In Example 7.61 (p. 277), we found explicit formulas for the Lorentz trans- 
formations of a pseudo-Euclidean plane in such a basis: 


W(/i) = «/l. 

U(f 2 )=a~ 1 f 2 

(12.11) 

W(/i) = «/ 2 . 

U(f 2 )=a~ l f l , 

(12.12) 


where a is an arbitrary nonzero number. In the sequel we shall need only transfor- 
mations given by formula (12.1 1). 

Since G is a nondegenerate space, it follows that by Theorem 6.9, we hâve the 
décomposition L = G ® (G)^. Let us now define a linear transformation A of the 
space L by the condition 

A(x + y) = U(x) + y, where x e G, y e (G)" 1 , (12.13) 

where VL is one of the Lorentz transformations of the pseudo-Euclidean plane G 
defined by formulas (12.11) and (12.12). It is clear that then A is a Lorentz trans- 
formation of the space L. 

A motion of type (b) of the space L is a transformation P(,A) obtained in the 
case that in formula (12.13), we take as VL the transformation given by relation- 
ships (12.11). Ail motions thus constructed hâve a fixed line Lj corresponding to 
the plane G. 

It is quite obvious that motions of types (a) and (b) do not exhaust ail motions of 
the hyperbolic plane, even if in the définition of motions of type (b), as V. in formula 
(12.13) we were to use transformations VL given not only by relationships (12.11), 
but also by (12.12). For example, they certainly do not include motions associated 
with Lorentz transformations that hâve a three-dimensional cyclic subspace (see 
Corollary 7.66 and Example 7.67). However, for our further purposes, it will suffice 
to use only motions of these two types. 

Example 12.4 In the sequel we are going to require explicit formulas for transfor- 
mations of type (b) in the case of the hyperbolic plane (that is, for n — 2). In this 
case, L is a three-dimensional pseudo-Euclidean space, and in the orthonormal basis 
e\, e 2 ,ei, such that 
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the scalar square of the vector x — x\e\ + * 2^2 + x 3 e 3 is equal to ( x 2 ) = + 

x 2 ~ x 3 ‘ The P°i nts °f hyperbolic plane L are contained in the affine plane 
*3 = 1, hâve inhomogeneous coordinates * = xi/x 3 and y = x 2 /-V 3 , and satisfy the 
relationship x 2 + y 2 < 1. 

For writing the transformation A, let us consider the pseudo-Euclidean plane 
L ! = (e\, £ 3 ) and let us choose in it a basis consisting of lightlike vectors f f 2 
associated with vectors e\, £3 by the relationships 



e\ +e 3 


2 


1 



(12.14) 


from which we also obtain the inverse formulas e\ — f \ + f 2 an d e 3 — f \ ~ f 2 - 
Let us note that the orthogonal complément ( L ')- L equals ( 02 ), and by Theo- 
rem 6.9, we hâve the décomposition L = L 0 (^ 2 ). Then in accord with formula 
(12.13), for the vector z = x -b y, where x e L' and y g (^ 2 )» we obtain the value 
A(z) = t((x) + y, where Lt : L L is the Lorentz transformation defined in the 
basis / 1 , f 2 hy formula (12.11). From this, taking into account expression (12.14), 
we obtain 


U{e 1 ) = 


ûf + Of 


-1 


e\ + 


ûf — ûf 


-1 


^ 3 , \L(e 3 ) = 


a — a 


-1 


■e\ + 


û' + û' 


-1 


■<? 3 . 


Let us set 


a + a 


-1 


« = 


b — 


a — ûf 


-1 


(12.15) 


Then a + b = a and a 2 — b 2 — 1. It is obvious that any numbers a and b satisfying 
these relationships can be defined in terms of the number a = a + b by formulas 
(12.15). Therefore, we obtain the linear transformation A : L — > L, for which 


A(e\) = ae\ + be 3 , A(e 2 ) = 02, A(e 3 ) = be\ + ae 3 . 

It is easy to see that for such a transformation, the vector * = x\e\ + *2^2 + *3^3 is 
carried to the vector 


A(x) = (ax 1 + bx 3 )e\ + * 2^2 + (bx\ + ax 3 )e 3 . 


In inhomogeneous coordinates, * = x\/x 3 and y = X 2 /x 3 . This means that a point 
with coordinates (x, y) is carried to the point with coordinates (*', y'), where 


ax + b 
bx + a ’ 



bx + a ’ 



(12.16) 


This particular type of motion yields, however, an important general property: 


Theorem 12.5 For every pair of points of a hyperbolic space there exists a motion 
taking one point into the other. 
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Proof Let the first point correspond to the line (a), and the second to the line (b), 
where a,b e L. If the vectors a and b are proportional, that is, (a) = ( b ), then our 
requirements will be satisfied by the identity transformation of the space L (which 
can be obtained in the form P (g), where 8 is the identity transformation of the 
space L). 

But if (a) ( b ), that is, dim(«, b) — 2, then let us set L ' = (a, b). Let us consider 

the Lorentz transformation VL : L f L f o: f type (b) given by formula (12.11), the 
corresponding Lorentz transformation A : L — ► L defined by formula (12.13), and 
the projective transformation P( e A) : P(L) — > P(L). 

Let us show that the constructed projective transformation P(eA) takes a point 
corresponding to the line (a) to a point corresponding to the line (b), that is, the 
linear transformation A : L —> L takes the line (a) to the line (b). Since vectors a 
and b are contained in the plane L, then by définition, it suffices for us to prove 
that for an appropriate choice of number a , the transformation VL : \J \J given by 
formula (12.11) takes the line ( a ) to the line (b). 

This is easily verified by a simple calculation using the basis / l5 / 2 , given by 
formula (12.14), in the pseudo-Euclidean plane L. Let us consider the timelike 
vectors a = a\f\ + ^2/2 an d b — b\f x + ^2/2- Since in the chosen basis, the 
scalar square of a vector is equal to the product of its coordinates, it follows that 
(a 2 ) = a\ü 2 < 0 and (b 2 ) = b \b 2 < 0. From this, it follows in particular that ail 
number s a\ , a 2 ,b\ , &2 are nonzero. 

We obtain from formula (12.11) that VL (a) — aci[f\ + oc~ { ci2f 2 , an d the condi- 
tion (VL (a)) = (b) means that VL (a) = /ib for some (i ^ 0. This yields the relation- 
ships aa\ — iib\ and a~ x 0,2 — iib> 2 , that is, 


aa 1 



a~a\b2 

a 2 = a\ib 2 = — 

b\ 



ü2b\ 

a\b2 


a\ü2b\b2 

(a[b 2 ) 2 


It is obvious that the latter relationship can be solved for a real number a if 
a\ü 2 b\b 2 > 0, and this inequality is satisfied, since by assumption, a \a ,2 < 0 and 
b\b2 <0. □ 


Let us note that we hâve thus far not used motions of type (a). We shall need 
them to strengthen the theorem we hâve just proved. To do so, we shall make use of 
the notion of a flag, analogous to that introduced in Sect. 3.2 for real vector spaces. 

Définition 12.6 A flag in a space L is a sequence of subspaces 

L 0 C Li c • • • C L w = L (12.17) 


such that: 

(a) dimL,- = i for ail i = 0, 1, . . . , n; 

(b) each pair of subspaces (L /+ i , L, ) is directed. 

A subspace L, is a hyperplane in L/+i , and as we hâve seen (see formula (12.5)), 
it defines a partition Lj+i into two half-spaces: L;+i \ L, = L^_j U L^. And as 
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earlier, the pair (L z+ i,L/) is said to be directed if the order of the half-spaces is 
indicated, for example by denoting them by and . Let us note that in a 
flag defined by the sequence (12.17), the subspace Lo has dimension 0, that is, it 
consists of a single point. We shall call this point the center of the flag (12.17). 

Theorem 12.7 For any twoflags ofa hyperbolic space , there exists a motion taking 
the first flag to the second. Such a motion is unique. 

P roof In the space L, let us consider two flags 0 and 0' with centers at the points 
P eh and P' eh, respectively. Let O G L be the point corresponding to the line 
(e n +i) in L, that is, the point with coordinates y\ = 0, . . . , y n — 0 in relationship 
(12.4). By Theorem 12.5, there exist motions / and f taking P to O and P' to O. 
Then the flags f{0) and f'(0') hâve their centers at the point O. Each flag is by 
définition a sequence of subspaces (12.17) in L to which correspond the subspaces 
of the vector space L. Thus to the flags f(0) and f'(0') there correspond two 
sequences of vector subspaces, 


(e„ + i} = L 0 cLiC---CL„ = L and <e„ +1 > = C L[ C • • ■ C L' n = L, 

where dim L/ = dim L- = i + 1 for ail i = 0, 1 , . . . , n. 

Let us recall that the space L is identified with a part of the affine Euclidean space 
E , namely with the interior of the unit sphere U C E given by relationship (12.4). To 
investigate L as a part of E (see Fig. 12.1), it will be convenient for us to associate 
with each subspace M c L containing the vector e n +\, the affine subspace N C E 
of dimension one less containing the point O. To this end, let us first associate 
with each subspace M c L containing the vector e n +\, the vector subspace N c M 
determined by the décomposition M = (e n +[) ® N. Employing notation introduced 
earlier, we obtain that 


N = ((Ch+i ) -1 n M) = ({ei e n ) n M) c (ci , . . . , e„) = E 0 , 

that is, N is contained in the space of vectors of the affine space E. Consequently, 
the vector subspace N c Eo détermines a set of parallel affine subspaces in E that 
are characterized by their spaces of vectors coinciding with N. Such affine subspaces 
can be mapped to each other by a translation (see p. 296), and to détermine one of 
them uniquely, it suffices simply to designate a point contained in this subspace. 
As such a point, we shall choose O. Then the vector subspace N c Eo uniquely 
détermines the affine subspace N c E, where clearly, dim A = dim N = dimM — L 
Thus we hâve established a bijection between k-dimensional vector subspaces 
M c L containing the vector e n +\ and ( k — l)-dimensional affine subspaces N C E 
containing the point O. Here clearly, the notions of directedness for the pair M' c M 
and N' C N coincide. In particular, flags f(0) and f'(0') of the space L with 
center O correspond to two particular flags of the affine Euclidean space E with 
center at the point O . 
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By Theorem 8.40 (p. 316), in an affine Euclidean space, there exists for every 
pair of flags, a motion that takes the first flag to the second. Since in our case, both 
flags hâve a common center O, it follows that this motion has the fixed point (9, 
and by Theorem 8.39, it is an orthogonal transformation A of the Euclidean space 
Eo. Let us consider g — P(e>4>), the motion of type (a) of the space L corresponding 
to this orthogonal transformation A. Clearly, it takes the flag f{0) to f'(<P'), that 
is, gf(0) — f '(&'). From this, we obtain that f'~ l gf(0) = 0 ' , as asserted in the 
theorem. 

It remains to prove the assertion about uniqueness in the statement of the theo- 
rem. Let f\ and /? be two motions taking some flag 0 with center at the point P 
to the same flag, that is, such that f\(0) — fi{0)- Then / = ff l fi is a motion, 
and f(0) — 0 . If we prove that / is the identity transformation, then the required 
equality f\ = fz will follow. 

By Theorem 12.5, there exists a motion g taking the point P to O. Let us set 0' — 
g(0). Then 0 f is a flag with center at the point O. From the equalities f(0) = 0 
and g(0) = 0' it follows that gfg~ [ (0') = 0 ' . Let us dénoté the motion gfg~ l 
by h. It clearly takes the flag 0' to itself, and in particular, has the property that 
h(0 ) = O. From what we said on p. 438, it follows that h is a motion of type (a), 
that is, h = P(cA), where is a Lorentz transformation of the space L that in turn, 
is determined by a certain orthogonal transformation T L of the Euclidean space Eo. 

Let 0 " be the flag in the Euclidean space Eo corresponding to the flag 0' of the 
space L. Then from the condition h(0') = 0f it follows that V.(0") = 0 " , that 
is, the transformation XI takes the flag 0" to itself. Consequently (see p. 225), the 
transformation XI is the identity, which yields that the motion h that it defines is the 
identity. From the relationship h — gfg ~ { , it then follows that gf = g, that is, / is 
the identity transformation. □ 

Thus motions of a hyperbolic space possess the same property as that established 
in Sect. 8.4 (p. 317) for motions of affine Euclidean spaces. It is this that explains 
the spécial place of hyperbolic spaces in geometry. The Norwegian mathematician 
Sophus Lie called this property “free mobility.” There exists a theorem (which we 
shall not only not prove, but not even formulate precisely) showing that other than 
the space of Euclid and the hyperbolic space of Lobachevsky, there is only one 
space that exhibits this property, called a Riemann space (we shall hâve a bit to say 
about this in Sect. 12.3). This assertion is called the Helmholtz-Lie theorem. For its 
formulation, it would be necessary first of ail to define just what we mean here by 
“space,” but we are not going to delve into this. 

The property that we hâve deduced (Theorem 12.7) suffices for discussing the 
axiomatic foundations of hyperbolic geometry. 


12.2 The Axioms of Plane Geometry* 

Hyperbolic geometry arose historically as a resuit of the analysis of the axiomatic 
Systems of Euclidean geometry. The viewpoint toward geometry as based on a small 


444 


12 Hyperbolic Geometry 


number of postulâtes from which ail the remaining results are derived by way of 
formai proof arose in ancient Greece approximately in the sixth century B.C.E. Tra- 
dition connects this viewpoint with the name Pythagoras. An account of geometry 
with this point of view is contained in Euclid’s Eléments (third century B.C.E.). This 
point of view was accepted during the development of science in the modem era, 
and for a long time, geometry was taught directly from Euclid’s books, and then 
later, there appeared simplified accounts. Moreover, this same point of view came 
to permeate ail of mathematics and physics. In this spirit were written, for example, 
Newton’ s The Mathematical Principles of Natural Philosophy , known as the Prin- 
cipia. In physics and generally in the natural sciences, “laws of nature” played the 
rôle of axioms. 

In mathematics, this direction of thought led to a more thorough working out of 
the axiom System of Euclidean geometry. Euclid divides the assertions on which his 
exposition is based into three types. One he calls “définitions”; another, “axioms”; 
and the third, “postulâtes” (the principle separating the last two of these is unclear 
to modem researchers). Many of his “définitions” also seem questionable. For ex- 
ample, the following: “A line is a length without width” (définitions of “length” 
and “width” are not given). Some “axioms” and “postulâtes” (we shall call ail of 
these axioms) are simple corollaries of others, so that they could as well hâve been 
discarded. But what attracted the most attention was the “fifth postulate,” which in 
Euclid is formulated thus: 

That if a straight line falling on two straight Unes makes the interior angles on the same side 
less than two right angles, the two straight Unes, if produced indefinitely, meet on that side 
on which are the angles less than the two right angles. 

This axiom differs from the others in that its formulation is notably more com- 
plex. Therefore, the following question arose (probably already in antiquity): can 
this assertion be proved as a theorem derived from the other axioms? An enormous 
number of “proofs of the fifth postulate” appeared, in which, however, there was 
always found a logical error. These investigations nevertheless helped in clarifying 
the situation. For example, it was proved that in the context of the other axioms, 
the fifth postulate is équivalent to the following assertion about parallel fines that is 
now usually presented as this postulate: through every point A not lying on a line 
a, it is possible to construct exactly one line b parallel to a (fines a and b are said 
to be parallel if they do not intersect). Here the existence of a line b parallel to a 
and passing through the point A can easily be proved. The entire content of the fifth 
postulate is reduced to the assertion about its uniqueness. 

Finally, at the beginning of the nineteenth century, a number of researchers, one 
of whom was Nikolai Ivanovich Lobachevsky (1792-1856), came up with the idea 
that a proof of the fifth postulate is impossible, and so its négation leads to a new 
geometry, logically no less perfect than the geometry of Euclid, even though it con- 
tains in some respects some unusual propositions and relationships. 

The question could be posed more precisely as a resuit of the development of the 
axiomatic method. This was done by Moritz Pasch (1843-1930), Giuseppe Peano 
(1858-1932), and David Hilbert (1862-1943) at the end of the nineteenth century. 
In his work on the foundations of geometry, Hilbert formulated in particular the 
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principles on which an axiomatic System is constructed. Today, such an approach 
has become commonplace; we used it to define vectors and Euclidean spaces. The 
general principle consists in fixing a certain set of objects , which remain undefined 
(for example, in the case of the définition of a vector space, these were scalars and 
vectors), and also in fixing certain relations that are to exist among these objects, 
which are likewise undefined (in the case of the définition of a vector space, these 
were addition of vectors and multiplication of a vector by a scalar). Finally, axioms 
are introduced that establish the spécifie properties of the introduced concepts (in the 
case of the définition of a vector space, these were enumerated in Sect. 3.1). With 
such a formulation, there remains only the question of consistency of the theory, 
that is, whether it is possible from the given axioms to dérivé simultaneously some 
statement as well as its négation. In the sequel, we shall introduce an axiom System 
for hyperbolic geometry (restriction to the case of dimension 2) and discuss the 
question of its consistency. 

Let us begin with a discussion of axioms. The lists of axioms that Hilbert and 
his predecessors introduced in their early work turned out to possess certain logi- 
cal defects. For example, in déduction, it turned out to be necessary to use certain 
assertions that were not contained among the axioms. Hilbert then supplemented 
his System of axioms. Later, this System of axioms was simplified for the sake of 
clarity. We shall use the axiom System proposed by the German geometer Friedrich 
Schur (1856-1932). 1 Here we shall restrict our attention (exclusively for the sake of 
brevity) to the axiomatic s of the plane. 

A plane is a certain set 77, whose éléments A, B , and so on, are called points. 
Certain bijective mappings / : 77 77 are called motions. These are the fundamen- 

tal objects. The relationships among them are expressed as follows: 

(A) Certain distinguished subsets /, /', and so on, of the set 77 are called Unes. That 
an element A g 77 belongs to the subset / is expressed by saying that “the point 
A lies on the line Z” or “the line / passes through the point A.” 

(B) For three given points A, B, C lying on a given line /, it is specified when the 
point C is considered to lie between the points A and B . This must be specified 
for every line / and for every three points lying on it. 

These objects and relations satisfy the conditions called axioms , which it is con- 
venant to collect into several groups: 

I. Axioms of relationship 

1. For every two points, there exists a line passing through them. 

2. If these points are distinct, then such a line is unique. 

3. On every line there lie at least two points. 

4. For every line, there exists a point not lying on it. 

II. Axioms of order 

1. If on some line /, the point C lies between points A and B , then it is distinct 
from them and also lies between points B and A. 


^ere we shall follow the ideas of Boris Nikolaevich Delaunay, or Delone (1890-1980), in his 
pamphlet Elementary Proof of the Consistency of Hyperbolic Geometry , 1956. 
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Fig. 12.3 Intersection of the 
suies of a triangle by a line 




2. If A and C are two distinct points on some line, then on this line there is at 
least one point B such that C lies between points A and B . 

3. Among three points A, B, and C lying on a given line, not more than one of 
the points lies between the two others. 

Before formulating the last axiom of this group, let us give some new définitions. 
The set of ail points C on a given line / passing through the points A and B that 
lie between them (including the points A and B themselves) is called a segment 
with endpoints A and B , and is denoted by [A, B]. Axiom 2 of group II can be 
reformulated thus: [A,C]/1\(AUC), with the inequality here being understood 
as an inequality of sets. That a segment [A, B] contains points other than A and B 
is proved on the basis of the axioms of group I and the last axiom of group II, to 
the formulation of which we now turn. Three points A, B, C not ail lying on any 
one line are called a triangle , and this relationship is denoted by [A, B , C]. The 
segments [A, B], [B, C], and [C, A] are called the s ides of the triangle [A, B, C]. 

4. Pasch’s axiom. If points A, B, C do not ail lie on the same line, none of them 
belong to the line /, and the line / intersects one side of the triangle [A, B, C], 
then it also intersects another side of the triangle. 

In other words, if a line / has a point D in common with the line l' passing 
through points A and B , with D lying between A and B on /', then the line / either 
has a common point E with the line l\ passing through B and C, with E lying 
between them on /i, or has a common point F with the line h passing through A 
and C, with F lying between them on I 2 . The two cases discussed in this last axiom 
are depicted in Fig. 12.3. 

III. Axioms of motion 

1. For every motion /, the inverse mapping f~ [ (which exists by the définition 
of a motion as a bijective mapping of the set Fl) is also a motion. 

2. The composition of two motions is a motion. 

3. A motion préserves the order of points. That is, a motion / takes a line / to 
a line /(/), and if the point C on the line / lies between points A and B on 
this line, then the point /(C) of the line /(/) lies between points /(A) and 
f(B). 
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The formulation of the fourth axiom of motion requires certain results that can be 
obtained as corollaries of the axioms of relationship and order. We shall not prove 
these here, but let us give only the formulations. 2 

Let us begin with properties of lines. Let us choose a point O on a line /. Points 
A and B on this same line, both of them different from O , are said to line on one 
side of O if O does not lie between A and B. If we select some point A different 
from O, then points B different from O and lying together with A on one side of O 
form a subset of the set of points of the line / called a half-line and denoted by / + . 
It can be proved that if we choose in this subset another point A', then the half-line 
formed with it will be the same as before. Here what is important is only the choice 
of the point O . If we choose a point A i such that O lies between A and A i , then 
the point Ai détermines another half-line, denoted by l~ . The half-lines / + and l~ 
determined by the points A and Ai do not intersect, and their union is / \ O , that is, 
/+ D /“ = 0 and Z+ U /” = / \ O. 

One can verify analogous properties for a line / in the plane 77. Let us consider 
two points A and B that do not belong to the line /. One says that they lie on one 
side of / if either the line l' passing through them does not intersect the line /, or the 
lines / and l ' intersect in a point C that does not lie between points A and B of the 
line l' . The set of points not lying on the line / and lying on the same side of / as the 
point A is called a half-plane. Again, it is possible to prove that with the choice of 
another point A' instead of A in this half-plane, we define the same set. There exist 
two points A and A ' that do not belong to the same half-plane. However we select 
these points (given a fixed line /), we will always obtain two subsets 77 + and 77“ 
of the plane 77 such that 77 + (T 77“ = 0 and 77 + U 77 ~ = 77 \ /. 

Suppose we are given a point O and a line / passing through it. If in the partition 
of / \ O into two half-lines, one of them is distinguished, and in the partition 77 \ / 
into two half-planes, one of them is distinguished (for example, let us dénoté them 
by / + and 77 + , respectively), then the pair (0,1) is called a flag and is denoted 
by 0 . As follows from what was discussed in Sect. 12.1, this is a spécial case (for 
n — 2) of the notion of a flag introduced earlier. 

Every motion takes a flag to a flag, that is, if / is a motion and 0 is the flag 
(0,1), then the sets /(/) + and /(/)“, whose union is /(/) \ f(0 ), coincide with 
/(/ + ) and /(/“), where / + and l~ are the half-lines on the line / determined by 
the point O. Here their order can change. Analogously, a pair of half-planes /(77) + 
and f(TJ)~ defined by the line /(/) coincide with the pair /(77 + ) and f(Tl~), 
where 77 + and 77 “ are the half-planes determined by the line /. Their order also 
can change. 

We can now formulate the last (fourth) axiom of motion: 

4. Axiom of free mobility. For any two flags 0 and 0 ' , there exists a motion / 
taking the first flag to the second, that is, f(0) = 0 ' . Such a motion is unique, 
and it is uniquely determined by the flags 0 and 0 ' . 


2 Some of these are proved in first courses in geometry, and in any case, elementary proofs of ail of 
these results can be found in Chap. 2 of the book Higher Geometry, by N.V. Efimov (Mir, 1953). 
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IV. Axiom of continuity 

1. Let a set of points of some line / be represented arbitrarily as the union of 
two sets M i and M 2 , where no point of the set M\ lies between two points 
of the set M 2 , and conversely. Then there exists a point O on the line / such 
that M\ and M 2 coincide with the half-lines of / determined by the point O , 
to either of which the point O can be joined. 

This axiom is also called Dedekind’s axiom. 

Axioms I-IV that we hâve presented are called axioms of “absolute geometry.” 
They hold for both Euclidean and hyperbolic geometry. These two geometries are 
distinguished by the addition of one axiom that deals with parallel lines. Let us 
recall that parallel lines are lines having no points in common. Thus in both cases, 
one more axiom is added: 

V. Axiom of parallel lines 

1. In Euclidean geometry: For every line / and every point A not lying on it, 
there exists at most one line l f passing through the point A and parallel to /. 

Y . In hyperbolic geometry: For every line / and every point A not lying on it, 
there exist at least two distinct lines l f and / " parallel to /. 

The justified interest in precisely these two axioms is due to the fact that already 
in absolute geometry (that is, with only the axioms from groups I-IV), it is possible 
to prove that for every line / and every point A not on /, there exists at least one line 
l' passing through A and parallel to /. 

It is now possible to formulate more precisely the goal that mathematics set for 
itself in the attempt to “prove the fifth postulate,” that is, to dérivé assertion 1 in 
group V of axioms from axioms in groups I-IV. But Lobachevsky (and other re- 
searchers of the same epoch) came to the conclusion that this was impossible, and 
this meant that the System comprising groups I-IV and axiom Y was consistent. 

Strictly speaking, we could hâve posed such questions even earlier, in connection 
with any of the théories that we encountered based on some System of axioms, 
such as the theory of vector spaces or that of Euclidean spaces. The question of the 
consistency of the concepts of vector spaces or Euclidean spaces is easily answered: 
it suffices to show (in the case of real spaces) examples of vector spaces over W 1 of 
any finite dimension or Euclidean spaces with inner product (x, y) = x\y\ + • • • + 
x n y n . Of course, this assumes the construction and proof of the consistency of the 
theory of the real numbers, but that lies outside the scope of our investigation, and 
we shall not consider it here. However, assuming as given that the properties of real 
numbers are defined and do not raise any doubts, we may, for example, say that if 
the System of axioms of a real vector space given in Sect. 3.1 were inconsistent, then 
we would be able to dérivé two mutually contradictory assertions about the space 
W 1 . However, any assertion about the space M 11 can be reduced by définition to an 
assertion about the real numbers, and then we would obtain a contradiction in the 
domain of real numbers. 

The same question could be posed in relationship to Euclidean geometry, that 
is, with respect to the System of axioms consisting of axioms of groups I-IV and 
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axiom 1 of group V. Here the answer is in fact already known, since we hâve con- 
structed the theory of affine Euclidean spaces (even in arbitrary dimension n). It is 
easily ascertained that for n = 2, ail the axioms of Euclidean geometry that we in- 
troduced are satisfied. Some refinements are perhaps necessary only in connection 
with the axioms of order. 

These axioms do not require an inner product on the space and are formulated 
for an arbitrary real affine space V in Sect. 8.2. Ail the assertions constituting the 
axioms of order now follow directly from the properties of order of the real num- 
bers, except only Pasch’s axiom. Its idea is that if a line “enters” a triangle, then it 
must “exit” from it. Intuitively, this is quite convincing, but with our approach, we 
must dérivé this assertion from the properties of affine spaces. It is a very simple 
argument, whose details we leave to the reader. 

Specifically, by what is given, points A and B (we shall use the same notation 
as in the formulation of the axioms) lie in different half-planes into which the line 
/ divides the plane 77. Everything dépends on the half-plane to which the point C 
belongs: to the same one as A, or to the same one as B. In the first case, the line / 
has a common point with the line I 2 , which lies on it between B and C, while in the 
second case, the common point is with the line / 1 , which lies between A and C ; see 
Fig. 12.3. In each of these two cases, the assertion of Pasch’s axiom is easy to verify 
if we recall the définitions. 

We in fact checked in one form or another that the remaining axioms are satisfied 
even as assertions that relate to arbitrary dimension. 

We shall now turn to the axioms of hyperbolic geometry, that is, the axioms of 
groups I-IV and axiom Y of group V. We shall prove that they are consistent, based 
on the consistency of the usual properties (which likewise are easily reduced to 
certain axioms) of the set of real numbers M and based on the theory of Euclidean 
spaces of dimension 2 and 3 constructed on this basis. On this foundation, we shall 
prove the following resuit. 

Theorem 12.8 The System of axioms of hyperbolic geometry is consistent. 

P roof We shall consider in the Euclidean plane L the open disk K (given, for exam- 
ple, in some coordinate System by the condition x 2 + y 2 < 1). We shall call the set 
of its points a “plane” (denoted by 77), and we shall call “points” only the points of 
this disk. The intersection of every line / of the plane L with the disk K that has at 
least one point in common with this disk is the interior of some segment (this was 
proved in the previous section). We shall call such nonempty intersections / D K 
“fines,” denoted by 7 , 7 , and so on. Finally, we shall call a projective transformation 
of the plane L taking the disk K into itself a “motion.” 

Since the définition of projective transformation assumes a study of the projec- 
tive plane, and a projective space of dimension n and its projective transformations 
were defined in Chap. 9 in terms of a vector space of dimension n + 1, it follows 
that for the analysis of the hyperbolic plane, we must use here a notion connected 
with a three-dimensional vector space. However, it would not be difficult to give a 
formulation appealing only to properties of the Euclidean plane. 
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Fig. 12.4 “Lines” and 
“ points ” of the hyperbolic 
plane 



Now let us define the fundamental relationships between “lines” and “points.” 
That a “line” 7 passes through a “point” A g 77 will be understood to mean the 
condition that the line / passes through the point A. Thus an arbitrary “line” 7 is the 
set of “points” that lie on it. Let “points” A, B, C lie on the “line” 7. We shall say 
that a “point” C lies between “points” A and B if such is the case for A, B , and C as 
points on the Euclidean line / that contains 7 (this makes sense, since / is contained 
in Euclidean space). 

It remains to verify that the notions and relationships presented satisfy the axioms 
of hyperbolic geometry, that is, the axioms of groups I-IV and axiom Y of group V. 
The vérification of this for the axioms of groups I, II, and IV is trivial, since the 
corresponding objects and relationships are defined exactly as in the surrounding 
Euclidean plane. For the axioms of group III (axioms of motion), the required prop- 
erties were proved in the previous section (indeed, for the case of a space of arbitrary 
dimension n). It remains only to consider axiom Y of group V. 

Let 7 be the “line” associated with the line / in the Euclidean plane L. Then the 
line / intersects the boundary S of the disk K in two different points: P' and P " . 
Let A be a “point” of the “plane” 77 (that is, a point of the disk K) not lying on 
the line /. By the axioms of Euclidean geometry, through the points A and P r in 
the plane L, there passes some line l' . It détermines the “line” 7 = Y D K of the 
“plane” 77. Similarly, the point P " détermines the “line” l" — l" D K\ see Fig. 12.4. 

The lines l' and l" are distinct, since they pass through different points P' and 
P " of the plane L. Therefore, by the axioms of Euclidean geometry, they hâve no 
common points other than A. But the “lines” 7 and 7 , as nonempty segments of 
Euclidean lines excluding the endpoints, contain infinitely many points and in par- 


-! 


-Il 


ticular, the “points” B' e l and B" el , with B' ^ B". This means that the “lines” 
7 and 7 are distinct. On the other hand, in the sense of our définitions, both of them 
are parallel to the “line” 7, that is, they hâve no common “points” with it (points 
of the disk K). For example, the line l f has with / the common point P' in the Eu- 
clidean plane L, which means that by the axioms of Euclidean geometry, they hâve 
no other common points, and in particular, no common points in the disk K. 

We see that assertion Y holds for every “line” 7 c 77 and every “point” A ^ 7. 
Let us now assume that from the axioms of hyperbolic geometry there could be 
derived an inconsistency (that is, some assertion and its négation). Then we could 
apply the same reasoning to the notions that earlier, with the proof of Theorem 12.8, 
we wrote in quotation marks: “point,” “plane,” “line,” and “motion.” Since they, 
as we hâve seen, satisfy ail the axioms of hyperbolic geometry, we would again 
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arrive at a contradiction. But the notions “plane,” “line,” and “motion,” and also 
the relationship “lies between” for three points on a line were defined in terms of 
Euclidean geometry. Thus we would arrive at a contradiction to Euclidean geometry 
itself. □ 

Let us focus attention on this fine logical construction: we construct objects in 
some domain that satisfy a certain System of axioms, and thus we prove the con- 
sistency of this System if the consistency of the domain from which the necessary 
objects are taken has been accepted. Today, one says that a model of this axiom 
System has thereby been constructed in another domain. In particular, we earlier 
constructed a model of hyperbolic geometry in the theory of vector spaces. Only by 
constructing such a model was the question of the provability of the “fifth postulate” 
decided in mathematics. 

In conclusion, it is of interest to dwell a bit on the history of this question. In- 
dependent of Lobachevsky, a number of researchers came to the conclusion that a 
négation of the “fifth postulate” leads to a meaningful and consistent branch of math- 
ematics, a “new geometry,” eventually given the désignation “non-Euclidean geom- 
etry.” There is no question here of priority. Ail the researchers clearly worked inde- 
pendently of one another (Gauss’s correspondence from the 1820s, Lobachevsky ’s 
publication of 1829, and Jânos Bolyai’s of 1832). Most of these who became known 
later were amateurs, not professional mathematicians. But there were some excep- 
tions: outside of Lobachevsky, there was the greatest mathematician of that epoch — 
Gauss. The majority of such researchers known to us who clearly arrived at the 
same conclusions independently became known precisely because of their corre- 
spondence with Gauss, which was published along with other of Gauss’s papers 
after his death. It is clear from these publications that in his youth, Gauss had at- 
tempted to prove the fifth postulate, but later concluded that there existed a meaning- 
ful and consistent geometry that did not include this postulate. In his letters, Gauss 
discussed the similar views of his correspondents with great interest. 

He clearly received the work of Lobachevsky with sympathetic understand- 
ing when it began to appear in translation, and on Gauss’s recommendation, 
Lobachevsky was elected a member of the Gottingen Academy of Sciences. 

In one of Gauss’s diaries can be seen the name Nikolai Ivanovich Lobachevsky, 
written in Cyrillic letters: 

HHKO JI AË MBAHOBMH JI O B A M E B C K M Ë 

But it is surprising that Gauss himself, throughout his entire life, published not a 
line on this subject. Why was that? The usual explanation is that Gauss was afraid 
of not being understood. Indeed, in one letter in which he touched on the question 
of the “fifth postulate” and non-Euclidean geometry, he wrote, “since I fear the 
clamor of the Boeotians.” But it seems that this cannot be the full explanation of 
his mysterious silence. In his other works, Gauss did not fear being misunderstood 
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by his readers. 3 It is possible, however, that there is another explanation for Gauss’s 
silence. He was one of the few who realized that however many interesting theorems 
of non-Euclidean geometry might be deduced, this would prove nothing definitively ; 
there would always remain the theoretical possibility that future dérivations would 
yield a contradictory assertion. And perhaps Gauss understood (or sensed) that at 
the time (first half of the nineteenth century), the mathematical concepts had not y et 
been developed to pose and solve this question rigorously. 

Apparently, Lobachevsky was among the small number of mathematicians in 
addition to Gauss who understood this. For him, as with Gauss, there stood the 
question of “incomprehensibility.” First of ail, for Fobachevsky, there was the lack 
of compréhension among Russian mathematicians, especially analysts, who totally 
failed to accept his work. In any case, he constantly attempted to find a consistent 
foundation for his geometry. For example, he discovered its striking parallel with 
spherical geometry and expressed the idea that it was the “geometry of the sphere 
with imaginary radius.” His geometry could indeed hâve been realized in the form 
of some other model if the very notion of model had been sufficiently developed at 
that time. 

Beyond this (as noted by the French mathematician André Weil (1906-1998)), 

here we hâve the simplest case of duality between compact and noncompact sym- 

/ 

metric spaces, discovered in the twentieth century by Elie Cartan. 

Moreover, Lobachevsky proved that in three-dimensional hyperbolic space, there 
is a surface (called today a horosphere ) such that if we consider only the set of its 
points and take as lines the curves of a spécifie type lying on it (called today horo- 
cycles] ), then ail the axioms of Euclidean geometry are satisfied. From this it follows 
that if hyperbolic geometry is consistent, then Euclidean geometry is also consistent. 
Even if we accept the hypothesis that the “fifth postulate” does not hold, Euclidean 
geometry is still realized on the horosphere. Thus in principle, Lobachevsky came 
very close to the concept of a model. But he did not succeed in constructing a model 
of hyperbolic geometry in the framework of Euclidean geometry. Such a construc- 
tion was not easily granted to mathematicians. 

The following paragraph offers only a hint, and not a précisé formulation, of the 
corresponding assertions. 

First, in 1868, Eugenio Beltrami (1835-1899) constructed in three-dimensional 
Euclidean space a certain surface called a pseuclosphere or Beltrami surface , whose 
Gaussian curvature (see the définition on p. 265) at every point is the same néga- 
tive number. Hyperbolic geometry can be realized on the pseudosphere, where the 
rôle of lines is played by so-called géodésie Unes. 4 However, here we are talking 
about only a piece of the pseudosphere and a piece of the hyperbolic plane. Here the 
posing of the question must be radically changed, since the majority of the axioms 
that we hâve given assume (as in, for example, Euclidean geometry) the possibility 


3 For example, his first published book, Disquisitiones Arithmeticae, was considered for a long time 
to be quite inaccessible. 

4 More about this can be found, for example, in the book A Course of Dijferenticd Geometry and 
Topology, by A. Mishchenko and A. Fomenko (Mir, 1988). 
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of continuing lines to infinity. The coincidence of two bounded pièces is under- 
stood in the sense of the coincidence of the measures of lengths and angles, about 
which, in the case of hyperbolic geometry, more will be said in the following sec- 
tion. Moreover, Hilbert later proved that the hyperbolic plane cannot in this sense 
be completely identified with any surface in three-dimensional space (much later it 
was proved that it is possible for some surface in five-dimensional space). 

The model of hyperbolic geometry that we gave for the proof of Theorem 12.8 
was constructed by Félix Klein (1849-1925) in 1870. The history of its appearance 
was also astounding. Formally speaking, this model was constructed in 1859 by the 
English mathematician Arthur Cayley (1821-1895). But he considered it only as a 
certain construction in projective geometry and apparently did not notice the con- 
nection with non-Euclidean geometry. In 1869, the young (twenty-year-old) Klein 
became acquainted with his work. He recalled that in 1870, he gave a talk on the 
work of Cayley at the seminar of the famous mathematician Weierstrass, and, as he 
writes, “I finished with a question whether there might exist a connection between 
the ideas of Cayley and Lobachevsky. I was given the answer that these two Sys- 
tems were conceptually widely separated.” As Klein puts it, “I allowed myself to 
be convinced by these objections and put aside this already mature idea.” However, 
in 1871, he returned to this idea, formulated it mathematically, and published it. 
But then his work was not understood by many. In particular, Cayley himself was 
convinced as long as he lived that there was some logical error involved. Only after 
several years were these ideas fully understood by mathematicians. 

Of course, one can ask not only about the existence of Euclidean and hyperbolic 
geometries, but also about a number of different (in a certain sense) geometries. 
Here we shall formulate only the results that are relevant to the current discussion? 

First of ail, we must give a précisé sense to what we mean by “different” or 
“identical” geometries. This can be done with the help of the notion of isomorphism 
of geometries, which is analogous to the notion of isomorphism of vector spaces 
introduced earlier. Within the framework of a System of axioms used in this section, 
this can be done as follows. Let 77 and 77' be two planes satisfying the axioms of 
groups I-IV, and let G and G' be sets of motions of the respective planes. Mappings 
cp : 77 —> 77' and x// : G G' define an isomorphism (tp, \j/) of these geometries if 
the following conditions are satisfied: 

(1) Both mappings <p and xjr are bijections. 

(2) The mapping cp takes every line / in the plane 77 to some line cp(l) in the 
plane 77'. 

(3) The mapping (p préserves the relationship “lies between.” This means that if 
points A, B, and C lie on the line /, with C lying between A and B , then the 
point <p(C) lies between (p{A) and <p(B) on the line cp(l ). 

(4) The mappings cp and t/t agréé in the following sense: for every motion / e G, 
its image xjr(f) is equal to (pfcp~ [ . This means that for every point A g 77, the 
equality (i A(/))(^(A)) = <K/(A)) holds. 


5 Their proofs are given in every course in higher geometry, for example, in the book Higher Ge- 
ometry, by N.V. Efimov, mentioned earlier. 
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(5) For every motion / g G, the equality \js(f ! ) = 1 holds, and for every 

pair of motions fi,f 2 e G, we hâve 1 M/ 1 / 2 ) = 

Let us note that some of these conditions can be derived from the others, but for 
brevity, we shall not do this. 

We shall consider geometries up to isomorphism as just described, that is, we 
shall consider two geometries the same if there exists an isomorphism between 
them. In particular, geometries with respective axioms 1 and V in group V are 
clearly not isomorphic to each other, that is, they are two different geometries. From 
this point of view, geometries (in the plane) satisfying axioms 1 and F are funda- 
mentally different from each other. Namely, it has been proved that ail geometries 
satisfying axiom 1 in group V are isomorphic. 6 But geometries that satisfy axiom 
V in group V are characterized up to isomorphism by a certain real number c called 
their curvature. This number is usually assumed to be négative, and then it can take 
on any value c < 0. 

Klein suggested that Euclidean geometry can be viewed as the limiting case of 
hyperbolic geometry as the curvature c approaches zéro. 7 As Klein further observed, 
if axiom 1 (of Euclid) is satisfied in our world, then we shall never know it. Since 
every physical measurement is taken with a certain degree of error, to establish the 
précisé equality c = 0 is impossible, for there always remains the possibility that the 
number c is less than zéro, but it is so small in absolute value that it lies beyond the 
limits of our measurements. 


12.3 Some Formulas of Hyperbolic Geometry* 

First of ail, we shall define the distance between points in the hyperbolic plane using 
its définition as the set of points of the projective plane P(L) corresponding to the 
fines of the three-dimensional pseudo-Euclidean space L lying within the light cône 
and its interprétation as the set of points on the unit circle U in the affine Euclidean 
plane E\ see Sect. 12.1. 

The meaning of the notion of distance is that it should be preserved under mo- 
tions of the hyperbolic plane. But we hâve defined a motion as a certain spécial 
projective transformation P( C A) of the projective plane P(L). Theorem 9.16 shows 
that in general, it is impossible to associate a number that does not change under 
an arbitrary projective transformation not only with two points, but even with three 
points of the projective line. But we shall use the fact that motions of the hyperbolic 
plane are not arbitrary projective transformations P(L), but only those that take the 
light cône in the space L into itself. 

Namely, to two arbitrary points A and B correspond the fines (a) and ( b ), lying 
inside the light cône. We shall show that they détermine two additional points, P 


6 Of course, here we are assuming that they ail satisfy the axioms of groups I-IV. 

7 Felix Klein. Nicht-Euklidische Geomeîrie, Gottingen, 1893. Reprinted by AMS Chelsea, 2000. 
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Fig. 12.5 The segment [P Q] 



and 2, that correspond to Unes lying on the light cône. But four points of a projec- 
tive space lying on a line already détermine a number that does not change under 
arbitrary projective transformations, namely their cross ratio (defined in Sect. 9.3). 
We shall use this number for defining the distance between points A and B. This 
définition has the spécial feature that it uses points corresponding to fines lying on 
the light cône ( P and Q), which are thus not points of the hyperbolic plane. 

We shall assume that the points A and B are distinct (if they coincide, then the 
distance between them is zéro by définition). This means that the vectors a and b are 
linearly independent. It is obvious that then a unique projective fine / passes through 
these points; it corresponds to the plane L ' = (a, b). The fine / détermines a fine l' 
in the affine Euclidean space E, depicted in Figs. 12.1 and 12.2. Since the fine l' 
contains the points A and B , which fie inside the circle U, it intersects its boundary 
in two points, which we shall take as P and Q. This was in fact already proved in 
Sect. 12.1, but we shall now repeat the corresponding argument. 

The points of / are the fines (x) consisting of ail vectors proportional to the 
vectors x = O A + t AB , where t is an arbitrary real number. Here the vector OA 
equals «, and the vector AB — c belongs to the subspace Eo if we assume that the 
points A , B and the fine / fie in the affine space E. This means that x — a -h te, 
where the vector c can be taken as fixed, and the number t as variable. Points x at 
the intersection of the fine l' with the light cône V C L are given by the condition 
(x 2 ) = 0, that is, 


((a -h te) 2 ) = (a 2 ) -h 2 ( 0 , c)t + (c 2 )t 2 = 0. (12.18) 

We know that (a 2 ) < 0, and the vector c belongs to Eo. Since Eo is a Euclidean 
space and the points A and B are distinct, it follows that (c 2 ) > 0. From this it 
follows that the quadratic équation (12.18) in the unknown t has two real roots t\ 
and t 2 of opposite signs. Suppose for the sake of definiteness that t \ < /2- Then 
for t\ < t < t 2 , the value of ((« -h te) 2 ) is négative, and ail points of the fine l' 
corresponding to the values t in this interval belong to L. We see that the fine / 
intersects the light cône V in two points corresponding to the values t — t\ and 
t = t 2 , while the values t \ < t < ?2 are associated with the points of the fine Li 
(that is, one-dimensional hyperbolic space) passing through A and B. Thus the fine 
Li coincides with the fine segment / C E whose endpoints are P and Q, which 
correspond to the values t = t\ and t — F?; see Fig. 12.5. 

It is clear that point A is contained in the interval (P, Q). Applying the same 
argument to the point B , we obtain that the point B is also in the interval ( P , Q). 

Let us label the points P and Q in such a way that P will dénoté the endpoint of 
the interval (P, Q) that is doser (in the sense of Euclidean distance) to the point A, 
and by Q the endpoint that is doser to B, as depicted in Fig. 12.5. 
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Now it is possible to give a définition of the distance between points A and B , 
which we shall dénoté by r(A, B): 

r(A, B) = logDV(A, B, Q, P), (12.19) 

where DV(A, B, Q, P) is the cross ratio (see p. 337). Let us note that in the défi- 
nition (12.19), we hâve not indicated the base of the logarithm. We could take any 
base greater than 1 , since a change in base results simply in multiplying ail distances 
by some fixed positive constant. But in any case, the length of a segment AB can be 
defined only up to a multiplicative factor that corresponds to the arbitrariness in the 
sélection of a unit length on a line. 

We shall explain a bit later why the logarithm appears in définition (12.19). The 
reason for using the cross ratio is explained by the following theorem. 

Theorem 12.9 The distance r(A , B) does not change under any motion f of the 
hyperbolic plane , that is , r(/(A), f(B)) = r(A, B). 

P roof The assertion of the theorem follows at once from the fact that a motion / of 
the hyperbolic plane is determined by a certain projective transformation P(A). This 
transformation P (A) carries the line l' passing through points A and B to the line 
passing through the points P(A)(A) and P(A)(Z?). This means that the transforma- 
tion takes the points P and Q , the intersection of the line l' with the boundary of the 
disk U , to the points P' and Q\ the intersection of the line P (A)(l r ) with this bound- 
ary. That is, P' — P(A)(P) and Q' — P(A>)(2)> or conversely, Q r — P(A)(P) and 
P' = P(A>)(<2). Moreover, the transformation P(A) préserves the cross ratio of four 
points on a line (Theorem 9. 17). □ 

To explain the rôle of the cross ratio, we jumped a bit ahead and skipped the 
vérification that the argument of the logarithm in formula (12.19) was a number 
greater than 1 and also that in the définition of r(A, B ), ail the conditions entering 
into the définition of a distance (p. xvii) were satisfied. We now return to this. 

Let us assume that the points P, A, B, Q are arranged in the order shown in 
Fig. 12.5. For the cross product, we may use formula (9.28), 

\AQ\ • \PB\ 

DV(A, £, g, P) = ' > 1, (12.20) 

\BQ\-\PA\ 

since clearly, |A2I > \BQ\ and \PB\ > \PA\. Therefore, the argument of the loga- 
rithm in formula (12. 19) is a number greater than 1, and so the logarithm is a positive 
real number. Therefore, r(A, B) >0 for ail pairs of distinct points A and B. 

Let us note that it would be possible to make do without the order of the points P 
and Q that we chose. For this, it would be sufficient to verify (this follows directly 
from the définition of the cross ratio) that under a transposition of the points P and 
2, the cross ratio d is converted into l/d. Thus the logarithm (12.19) that gives the 
distance is defined up to sign, and we can define the distance as the absolute value. 
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If we interchange the positions of A and B, then the points P and Q defined in 
the agreed-upon way also exchange places. It is easy to verify that the cross ratio 
détermines a distance according to formula (12.19) that will not change. In other 
words, we hâve the equality 


r(B,A) = r(A,B). 


(12.21) 


For any third point C collinear with A and B and lying between them, the con- 
dition 


r(A,B) = r(A,C) + r(C, B) 


( 12 . 22 ) 


is satisfied. It follows from the fact that (in the notation we hâve adopted) 


DV(A,fl, Q,P) = 


\AQ\-\BP 

\BQ\-\AP 


= DV(A, C, Q, P) ■ DV(C, B, Q, P), 


(12.23) 


since 


DV (A, C, Q, P) = 


\AQ\-\CP\ 

\CQ\-\AP\' 


DV(C, B, Q, P) = 


\CQ\-\BP\ 

I BQ\-\CP\' 


(12.24) 


For the vérification, it remains only to substitute the expressions (12.24) into for- 
mula (12.23). 

In any sufficiently complété course in geometry, it is proved without using the 
parallel postulate (that is, in the framework of “absolute geometry”) that there exists 
a function r(A, B) of a pair of points A and B that satisfies the following condi- 
tions: 

1. r(A , B) > 0 if A ^ B, and r(A, B) — 0 if A — B\ 

2. r(B , A) = r(A, B) for ail points A and B\ 

3. r(A, B) = r(A, C) + r(C, B) for every point C collinear with A and B and lying 
between them; 

and most importantly, 

4. the function r(A, B) is invariant under motions. 

Using the définitions given at the beginning of this book, we may say in short that 
r(A, B) is a metric on the set of points in the plane under considération and motions 
are isométries of this metric space. 

Such a function is unique if we fix two distinct points Ao and Bq for which 
r(Ao, Bq) = 1 (“unit of measurement”). This means that these assertions also hold 
in hyperbolic geometry, and formula (12.19) defines this distance (and the base of 
the logarithm in (12.19) is chosen in correspondence with the chosen “unit of mea- 
surement”). 

Every triple of points A, B, C satisfies the condition 


r(A,B)<r(A,C) + r(B,C)- 


(12.25) 
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Fig. 12.6 The triangle 
inequality 
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This is the familiar triangle inequality , and in many courses in geometry, it is derived 
without use of the parallel postulate, that is, as a theorem of “absolute geometry.” 
Thus inequality (12.25) holds as well in hyperbolic geometry. But we shall now give 
a direct (that is, resting directly on formula (12.19)) proof of this due to Hilbert. 

Let us recall that in the model that we hâve considered, the points of the hyper- 
bolic plane are points of the disk K in the Euclidean plane L, and the lines of the 
hyperbolic plane are the line segments of the plane L that lie inside the disk K . 

Let us consider three points A, B, C in the disk K. We shall dénoté the points 
of intersection of a line passing through A and B with the boundary of the disk K 
by P and Q , and the analogous points for the line passing through A and C will be 
denoted by U and V, and for the line passing through B and C, by S and T. See 
Fig. 12.6. 

Let us dénoté the point of intersection of the line AB and the line SU by X , and 
the point of intersection of the line AB and the line TV by Y. Then we hâve the 
inequality 


DV(A, B, Y, X)>D V(A,B,Q,P). 
Indeed, the left-hand side of (12.26) is equal by définition to 


DV(A, B, 7, X) = 


\AY\ • \BX\ 
\BY\ • \ AX\ ’ 


(12.26) 


(12.27) 


and its right-hand side is given by the relationship (12.20). Therefore, inequality 
(12.26) follows from the fact that 


\AY\ 

\BY\ 


> 


\AQ\ 

\BQ\ 


and 



\BP\ 

I ~ÂF\' 


(12.28) 


Let us prove the first of inequalities (12.28). Let us define a = \AB \ , t\ — \BQ \ , 
and t 2 — \BY\. Then we obviously obtain the expressions \AQ\/\BQ\ — (a + t\)/t\ 
and \AY\/\BY\ — (< a + fz)/^- For a > 0, the function (a + t)/t in the variable t 
decreases monotonically with increasing t, and therefore, from the fact that t 2 < t\ 
(which is obvious from Fig. 12.6) follows the first of inequalities (12.28). Defining 
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a — \AB\, î\ — \AX\, and ^ = |AP|, using completely analogous arguments, we 
may prove the second inequality of (12.28). 

Let us dénoté the intersection of the lines SU and TV by VF, let us connect this 
line with the point C, and let us dénoté the point of intersection of the line thus ob- 
tained with the line AB by D. Then the points X, A, D, Y and points U, A, C,V are 
obtained from each other by a perspective mapping just as was done for the points 
Y, B, D, X and T, B, C, S. Then in view of Theorem 9.19, we hâve the relationships 

\AY\ • \DX\ _ \AV\ • \CU\ \BX\ • \DY\ _ 1551 • \CT\ 

\DY\ • \AX\ ~ \CV\ • | At/f \DX\ • \AY\ ~ |C5| • \BT\ 

Multiplying these equalities, we hâve 

\AY\ • \BX\ _ \AV\ • \CU\ |55| • \CT\ 

\BY\ • \AX\ ~ \CV\ ■ \AU\ ' \CS\ • \BT\ 

Taking the logarithm of the last equality, and taking into account (12.27) for 
DV(A, 5, y, Z), the analogous expression for DV(A, C, U , V) and that for DV(5, 
C, S , T), and définition (12.19), we obtain the relationship 

logDV(A, 5, Y, X) = r(A, C) + r(B 9 C), 

from which, taking into account (12.26), we obtain the required inequality (12.25). 

Let us note that if the point B approaches Q along the segment PQ (see 
Fig. 12.6), then \BQ\ approaches zéro, and consequently, r(A, B) approaches in- 
finity. This means that despite that fact that the line passing through the points A 
and B is represented in our figure by a segment of finite length, its length in the 
hyperbolic plane in infinité. 

The measurement of angles is similar to that of line segments. As we know, an 
arbitrary point O on a line / partitions it into two half-lines. One half-line together 
with the point O is called a ray h with center O . Two rays h and k with common 
center O are called an angle ; we shall assume that the ray h is obtained from k by a 
counterclockwise rotation. This angle is denoted by Z(/i, k). 

In “absolute geometry,” it is proved that for each angle with vertex at the point 
(9, there is a unique real number /.(h, k ) satisfying the following four conditions: 

1. X(h, k) > 0 for ail h^k; 

2. X(k 9 h) = X(h,ky 9 

3. if / is a motion and f(h) = h', f(k) = k\ and O ' — f(O) is the vertex of the 
angle Z(h' , k '), then X(h' , k') = X(h 9 k). 

To formulate the fourth property, we must introduce some additional concepts. 
Let the rays h and k forming the angle X(h,k) lie on lines l\ and fa. The points in 
the plane lying on the same side of the line l\ as the points of the half-line k and on 
the same side of the line I 2 as the points of the half-line h are called interior points 
of the angle Z(h,k). A ray / with the same center O as the rays h and k is said to 
be an interior ray of the angle Z(h, k ) if it consists of interior points of this angle. 
We can now formulate the last property: 
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4. If / is an interior ray of the angle Z(/z, k), then Z(/z, /) + Z(/, k) — Z(/z, &). 

As in the case of distance between points, the measure of an angle is defined 
uniquely if we choose a “unit measurement,” that is, if we take a particular angle 
Z(/îo, *o) as the “unit angle measure.” 

We shall point out an explicit method of defining the measure of angles in hyper- 
bolic geometry that is realized in the disk K given by the relationship x 2 + y 2 < 1 
in the Euclidean plane L with coordinates x, y. 

Let Z(/z', k ') be the angle with center at the point 0\ and let / be an arbitrary 
motion taking the point O' to the center O of the disk K. From the définitions, it is 
obvious that / takes the half-lines h' and k f to some half-lines h and k with center at 
the point O. Let us set the measure of Z (/Z, k') equal to the Euclidean angle between 
the half-lines h and k. The main difficulty in this définition is that it uses a motion 
/, and therefore, we must prove that the measure of the angle thus obtained does not 
dépend on the choice of the motion / (of course, with the condition f(0')=0). 

Let g be another motion with the same property that g (O') = O. Then g -1 (O) = 
0\ and this means that fg~ [ (O) — O, that is, the motion fg~ [ leaves the point O 
fixed. As we saw in Sect. 12.1 (p. 438), a motion possessing such a property is 
of type (a), which means that /g -1 corresponds to an orthogonal transformation 
of the Euclidean plane L; that is, the angle Z(h,k) is taken to the angle Z(h,k) 
via the orthogonal transformation /g -1 , which préserves the inner product in L 
and therefore does not change the measure of angles. This proves the correctness 
of the définition of angle measure that we hâve introduced. Equally easy are the 
vérifications of properties 1-3. 

The best-known property of angles in hyperbolic geometry is the following. 

Theorem 12.10 In hyperbolic geometry , the sum of the angles of a triangle is less 
than two right angles , that is , less than n . 

Since we are talking about a triangle, we can restrict our attention to the plane 
in which this triangle lies and assume that we are working in the hyperbolic plane. 
The key resuit is related to the fact that an angle Z(/î, k) in hyperbolic geometry 
also détermines a Euclidean angle, and we may then compare the measures of these 
angles. We shall dénoté the measure of the angle Z(h,k) in hyperbolic geometry, as 
before, by Z(/z, k ), and its Euclidean measure by Z E (/i, k). 

Lemma 12.11 If one ray of the angle Z( h,k ) (for example , h) passes through the 
center O of the disk K , then the measure of this angle in the sense of hyperbolic 
geometry is less than the Euclidean measure , that is , 

Z(/z,£) <Z E (h,k). (12.29) 

First, we shall show how easily Theorem 12.10 follows from the lemma, and then 
we shall prove the lemma itself. 

P roof of Theorem 12.10 Let us dénoté the vertices of the triangle in question by 
A, B, C. Since the measure of an angle is invariant under a motion, it follows by 
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Fig. 12.7 A triangle in the 
hyperbolic plane 



Theorem 12.5 that we can choose a motion taking one of the vertices of the triangle 
(for example, A) to the center O of the disk K. Let the vertices B and C be taken 
to B' and C' . See Fig. 12.7. 

It suffices to prove the theorem for the triangle OB'C ' . But for the angle 
Z B' OC ' , we hâve by définition the equality 

ZB'OC' = Z e B'OC', 

and for the two remaining angles, we hâve by the lemma, the inequalities 

XO B’ C' < X e O B' C ' , XOC'B' <X e OC'B'. 

Adding, we obtain for the sum of the angles of triangle OB'C ' the inequality 

XB' OC' + XOB'C ' + XOC'B' < X e B'OC' + X e OB'C' + X e OC'B'. 

By a familial* theorem of Euclidean geometry, the sum on the right-hand side is 
equal to jt, and this proves Theorem 12.10. □ 


P roof of Lemma 12.11 We shall hâve to use the explicit form of the définition of the 
measure of an angle. Let the ray h of the angle Z(h,k) pass through the point O. 
To describe the disk K , we shall introduce a Euclidean rectangular System of co- 
ordinates (v, y) and assume that the vertex of angle Z(h,k) is located at the point 
O' with coordinates (À, 0), where À ^ 0. For this, it is necessary to execute a ro- 
tation about the center of the disk in such a way that the point O ' passes through 
some point of the line y = 0 and use the fact that angles are invariant under such a 
rotation. 

Now we must write down explicitly a motion / of the hyperbolic plane taking the 
point O to O' . We already constructed such a motion in Sect. 12.1; see Example 12.4 
on p. 439. There, we proved that there exists a motion of the hyperbolic plane that 
takes the point with coordinates (v, y) to the point with coordinates (x r , y '), given 
by the relationships 



ax + b 



y 


bx + a' 


bx + a' 


a 2 - b 2 = 1. 


(12.30) 
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Fig. 12.8 Angles in the 
hyperbolic plane 



If we want the point O' — (X, 0) to be sent to the origin O — (0, 0), then we 
should set aX + b — 0, or equivalently, X = —b la. It is not difficult to verify that it 
is possible to represent any number X in this form. Thus the mapping (12.30) has 
the form 



x — X 
1 — Xx ’ 



y 

a ( 1 — Ajc) 


(12.31) 


Let the ray k intersect the y-axis at the point A with coordinates (0, /x); see Fig. 12.8. 
(We note that this point is not required to be in the disk K.) 

From formula (12.3 1), it is clear that our transformation takes a vertical line x — c 
to a vertical line x — c' . The point O is taken to the point O — (—À, 0), the point 
A — (0, /x) to the point A — (—À, fi/a), and the vertical line OA to the vertical line 
OA. By the définition of an angle in hyperbolic geometry, ZOO' A = Z E OOA. 
The tangents of the Euclidean angles are known to us: 


, , , v /x , — — (JA 11 

tan (Z e OOA) = —, tan (Z E OOA) = = — ; 

see Fig. 12.8. Since a 2 — 1 + b 2 , we hâve a > 1, and we see that in Euclidean geom- 
etry, we hâve the inequality tan(/ E OO A) < tan(/ E OO' A). The tangent is a strictly 
increasing function, and therefore we hâve the inequality Z e OO A < Z e OO' A for 
angles that are Euclidean. But Z O O' A — Z E 0 O A, and this means that ZOO' A < 
Z e OO'A. □ 


It is of interest to compare Theorem 12.10 with the analogous resuit for spheri- 
cal geometry. We hâve not yet encountered spherical geometry in this course, even 
though it was developed in detail much earlier than hyperbolic geometry, indeed 
in antiquity. In spherical geometry, the rôle of fines in played by great circles on 
the sphere, that is, sections of the sphere obtained by ail possible planes passing 
through its center. The analogy between great circles on the sphere and fines in the 
plane consists in the fact that the arc of the great circle joining points A and B has 
length no greater than that of any other curve on the sphere with endpoints A and B . 
This arc length of a great circle (which, of course, dépends also on the radius R of 
the sphere) is called the distance on the sphere from point A to point B . 
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Fig. 12.9 A triangle on the 
sphere 



The measurement of lengths and angles on the sphere can generally be defined 
in exactly the same way as in Euclidean or hyperbolic geometry. Here the angle 
between two “lines” (that is, great circles) is equal to the value of the dihedral angle 
formed by the planes passing through these great circles. We hâve the following 
resuit. 

Theorem 12.12 The sum of the angles of a triangle on the sphere is greater than 
two right angles , that is , greater than n . 

P roof Let there be given a triangle with vertices A, B, C on a sphere of radius R. 
Let us draw ail the great circles whose arcs are the sides AB, AC, and BC of triangle 
ABC. SeeFig. 12.9. 

Let us dénoté by Z a the part of the sphere enclosed between the great circle 
passing through the points A, B and the great circle passing through A, C. We in- 
troduce the analogous notation Z g and Zq • Let us dénoté by A the measure of the 
dihedral angle BAC and similarly for B and C. Then the assertion of the theorem 
is équivalent to asserting that A + B + C > n . 

But it is easy to see that the area of Z a is the same fraction of the area of the 
sphere as 2 A is of 2tt. Since the area of the sphere is equal to Ai tR 2 , it follows that 
the area of Z a is equal to 

9 2A 9 — 

An R 2 = 4 R 2 A. 

2n 

Similarly, we obtain expressions for the areas Z g and Zc\ they are equal to 4 R 2 B 
and 4 R 2 C respectively. Let us now observe that the régions Z a, Z g, and Zq to- 
gether cover the entire sphere. Here each point of the sphere not part of triangle 
ABC or of triangle A! B' C' symmetric to it on the sphere belongs to only one of 
the régions Z a, Z g, and Zc , and every point in triangle ABC o r the symmetric 
triangle A' B' C r is contained in ail three régions. We therefore hâve 

4 R 2 {A + B 4- C) = An R 2 + 2Saabc + ZSaa'B'C' — An R 2 A- ASaabc • 
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From this we obtain the relationship 

A + B + C = Jt + (12.32) 

from which it follows that A + B + C > n . □ 

Formula (12.32) gives an example of a sériés of relationships systematically de- 
veloped by Lobachevsky: if we were to assume that R 2 < 0 (that is, R is a purely 
imaginary number), then clearly, we would obtain from (12.32) the inequality 

A B C < 7 r, 

which is Theorem 12.10 of hyperbolic geometry. This is why Lobachevsky con- 
sidered that his geometry is realized “on a sphere of imaginary radius.” However, 
the analogy between theorems obtained on the basis of the négation of the “fifth 
postulate” and formulas obtained from those of spherical geometry by replacing R 2 
with a négative number had been already noted by many mathematicians working 
on these questions (some even as early as the eighteenth century). 

The reader should be warned that spherical geometry is entirely inconsistent with 
the System of axioms that we considered in Sect. 12.2. That System does not in- 
clude one of the fundamental axioms of relationship: several different lines can pass 
through two distinct points. Indeed, infinitely many great circles pass through any 
two antipodal points on the sphere. In connection with this, Riemann proposed an- 
other geometry less radically different from Euclidean geometry. We shall describe 
it in the two-dimensional case. 

For this, we shall use a description of the projective plane 77 as the collection of 
ail lines in three-dimensional space passing through some point O. Let us consider 
the sphere S with center at O. Every point P e S together with the center O of 
the sphere détermines a line /, that is, some point Q of the projective plane 77. The 
association P — ► Q defines a mapping of the sphere S to the projective plane 77 
whereby great circles on the sphere are taken precisely to lines of 77. Clearly, exactly 
two points of the sphere are mapped to a single point Q e 77: together with the point 
P, there is also the second point of the intersection of the line / with the sphere, that 
is, the antipodal point P ' . But Euclidean motions taking the sphere S into itself (we 
might call them motions of spherical geometry ) give certain transformations defined 
on the projective plane 77 and satisfying the axioms of motion. It is possible as well 
to transfer the measures of lengths and angles from the sphere S to the projective 
plane 77. Then we hâve the analogue of Theorem 12.12 from spherical geometry. 

This branch of geometry is called elliptic geometry . 8 In elliptic geometry, every 
pair of lines intersect, silice such is the case in the projective plane. Thus there are no 
parallel lines. However, in “absolute geometry,” it is proved that there exists at least 


8 Elliptic geometry is sometimes called Riemannian geometry, but that term is usually reserved for 
the branch of differential geometry that studies Riemannian manifolds. 
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Fig. 12.10 Elliptic geometry 



one line passing through any given point A not lying on a given line / that is parallel 
to /. This means that in elliptic geometry, not ail the axioms of “absolute geometry” 
are satisfied. The reason for this is easily ascertained: in elliptic geometry, there 
in no natural concept of “lying between.” Indeed, a great circle of the sphere S is 
mapped to a line / of the projective plane 77, where two antipodal points of the 
sphere (A and A', B and B\ C and C\ and so on) are taken to one point of the 
plane 77. See Fig. 12.10. It is clear from the figure that in elliptic geometry, we may 
assume equally well that the point C does or does not lie between A and B. 

Nevertheless, elliptic geometry possesses the property of “free mobility.” More- 
over, one can prove (Helmholtz-Lie theorem) that among ail geometries (assuming 
some rigorous définition of this term), only three of them — Euclidean, hyperbolic, 
and elliptic — possess this property. 


Chapter 13 

Groups, Rings, and Modules 


13.1 Groups and Homomorphisms 

The concept of a group is defined axiomatically, analogously to the notions of vec- 
tor, inner product, and affine space. Such an abstract définition is justified by the 
wealth of examples of groups throughout ail of mathematics. 

Définition 13.1 A group is a set G on which is defined an operation that assigns 
to each pair of éléments of this set some third element; that is, there is defined 
a mapping G x G — ► G. The element associated with the éléments gi and g 2 by 
this rule is called their product and is denoted by gi • g 2 or simply g\g 2 - For this 
mapping, the following conditions must also be satisfied: 

(1) There exists an element e e G such that for every g g G, we hâve the relation- 
ships eg — g and ge — g. This element is called the identity} 

(2) For each element g G G, there exist an element g' g G such that g g' = e and an 
element g" g G such that g" g = e. The element g' is called a right inverse , and 
the element g" is called a left inverse of the element g. 

(3) For every triple of éléments g\ , g 2 , g 3 G G, the following relationship holds: 

(gl82)g3 = gl(g2g3). (13.1) 

This last property is called cissociativity, and it is a property that we hâve already 
met repeatedly, for example in connection with the composition of mappings and 
matrix multiplication, and also in the construction of the exterior algebra. We con- 
sidered the associative property in its most general form on p. xv, where we proved 
that equality (13.1) makes it possible to define the product of an arbitrary number 
of factors g\g 2 • • • gk, which then dépends only on the order of the factors and not 


l The identity element of a group is unique. Indeed, if there existed another identity element e' e G, 
then by définition, we would hâve the equalities ee' = e' and ee' = e, from which it follows that 
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on the arrangement of parenthèses in the product. The reasoning given there applies, 
obviously, to every group. 

The condition of associativity has other important conséquences. From it, de- 
rives, for example, the fact that if g' is a right inverse of g, and g " is a left inverse, 
then 

g [gg)=g e = g , g (gg ) = [g g)g = eg = g , 

from which it follows that g' = g" . Thus the left and right inverses of any given 
element g G G coincide. This unique element g' — g " is called simply the inverse 
of g and is denoted by g -1 . 

Définition 13.2 If the number of éléments belonging to a group G is finite, then the 
group G is called & finite group , and otherwise, it is called an infinité group. The 
number of distinct éléments in a finite group G is called its order and is denoted by 
| G | . 

Let M be an arbitrary set, and let us consider the collection of ail bijective map- 
pings between M and itself. Such mappings are also called transformations of the 
set M. In the introductory section of this book, we defined the operation of com- 
position (that is, the sequential application) of arbitrary mappings of arbitrary sets 
(p. xiv). It follows from the properties proved there that the collection of ail trans- 
formations of a set M together with the operation of composition forms a group, 
where the inverse of each transformation f : M ^ M is given by the inverse map- 
ping / _1 : M -> M, while the identity is obviously given by the identity mapping 
on the set M. Such groups are called transformation groups , and it is with these that 
the majority of applications of groups are associated. 

It is sometimes necessary to consider not ail the transformations of a set, but to 
limit our considération to some subset. The situation that thus arises can be formu- 
lated conveniently as follows: 

Définition 13.3 A subset G' c G of éléments of a group G is called a subgroup of 
G if the following conditions are satisfied: 

(a) For every pair of éléments gi , g 2 g G\ their product gig 2 is again in G'. 

(b) G' contains the identity element e. 

(c) For every g g G', its inverse g -1 is again in G'. 

It is obvious that a subgroup G' is itself a group. Thus from the group of ail 
transformations, we obtain a set of examples (indeed, the majority of examples of 
groups). Let us enumerate some that are met most frequently. 

Example 13.4 The following sets are groups under the operation of composition of 
mappings. 

1. the set of nonsingular linear transformations of a vector space; 

2. the set of orthogonal transformations of a Euclidean space; 
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3. the set of proper orthogonal transformations of a Euclidean space; 

4. the set of Lorentz transformations of a pseudo-Euclidean space; 

5. the set of nonsingular affine transformations of an affine space; 

6. the set of projective transformations of a projective space; 

7. the set of motions of an affine Euclidean space; 

8. the set of motions of a hyperbolic space. 

Ail the groups enumerated above are groups of transformations (the set M is 
obviously the underlying set of the given space). Let us note that in the case of 
vector and affine spaces, there is the crucial requirement of the nonsingularity of the 
linear or affine transformations that guarantees the bijectivity of each mapping and 
thus the existence of an inverse element for each element of the group. 2 

However, not ail naturally occurring groups are groups of transformations. For 
example, with respect to the operation of addition, the set of ail integers forms a 
group, as do the sets of the rational, real, and complex numbers, and likewise, the 
set of ail vectors belonging to any arbitrary vector space. 

Let us remark that the axioms of motion 1, 2, and 3 introduced in Sect. 12.2 
can be expressed together as a single requirement, namely that the motions form a 
group. 


Example 13.5 Let us consider a finite set M consisting of n éléments. A transfor- 
mation / : M — M is called a permutation , and the group of ail permutations of the 
set M is called the symmetric group ofdegree n and is denoted by S n . It is obvious 
that the group S n is finite. 

We considered permutations earlier, in Sect. 2.6, in connection with the notions 
of symmetric and antisymmetric functions, and we saw that for defining a permu- 
tation f : M M, one can introduce a numération of the éléments of the set M, 
that is, one can write the set in the form M = {a\, ... ,a n ] and designate the im- 
ages /(ai), . . . , f(a n ) of ail the éléments ai, ... , a n . Namely, let f(a\) = aj { , . . . , 
f(a n ) — a j n . Then a permutation is defined by the matrix 




(13.2) 


where in the upper row are written in succession ail the natural numbers from 1 
to n, and in the lower row, under the number k stands the number jk such that 
f(ak) = a j k . Since a permutation / : M M is a bijective mapping, it follows that 
the lower row contains ail the numbers from 1 to n, except that they are written in 
some other order. In other words, (j , j n ) is some permutation of the numbers 
(1, ...,n). 


2 Unfortunately, there is a certain amount of disagreement over terminology, of which the reader 
should be aware: above, we defined a transformation of a set as a bijective mapping into itself, while 
at the same time, a linear (or affine) transformation of a vector (or affine) space is not by définition 
necessarily bijective, and to hâve bijectivity here, it is necessary to specify that the transformations 
be nonsingular. 
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Writing a permutation in the form (13.2) allows us in particular to ascertain eas- 
ily that \S n \ — n\. Let us prove this by induction on n. For n — 1, this is obvious: the 
group Si contains the single permutation that is the identity mapping on the set M 
consisting of a single element. Let n > 1 . Then by enumerating the éléments of the 
set M in every possible way, we obtain a bijection between S n and the set of ma- 
trices A of the form (13.2), whose first row contains the éléments 1, . . . , n, and the 
éléments j \, ... , j n of the second row take ail possible values from 1 to n. Let A' be 
the matrix obtained from A by deleting its last column, containing the element j n . 
Let us fix this element: j n — k. Then the éléments j\, ... , j n -\ of the matrix A ' as- 
sume ail possible values from the collection of the n — 1 numbers (1 , . . . ,k, . . . ,n), 
where the Symbol w , as before, dénotés the omission of the corresponding element. 
It is clear that the set of ail possible matrices A r is in bijective correspondence with 
S n - 1 , and by the induction hypothesis, the number of distinct matrices A' is equal to 
\S„-\ | = ( n — 1)!. But since the element j n — k can be equal to any natural number 
from 1 to /?, the number of distinct matrices A is equal to n{n — 1)! = n\. This gives 
us the equality \S n \ —n\. 

Let us note that the numération of the éléments of the set M used for writing 
down permutations plays the same rôle as the introduction of coordinates (that is, a 
basis) in a vector space. Furthermore, the matrix (13.2) is analogous to the matrix 
of a linear transformation of a space, which is defined only after the choice of a 
basis and dépends on that choice. Ftowever, for our further purposes, it will be more 
convenient to use concepts that are not connected with such a choice of numération 
of éléments. 

We shall use the concept of transposition, which was introduced in Sect. 2.6 
(p. 45). The définition given there can be formulated as follows. Let a and b be two 
distinct éléments of the set M . Then a transposition is a permutation of the set M 
that interchanges the places of the éléments a and b and leaves ail other éléments of 
the set M fixed. Denoting such a transposition by r a j ? , we can express this définition 
by the relationships 


ta,b(à) = b, X a,b(b) = a, x a ,b(x) = x (13.3) 

for ail x a and x b. 

In this notation, Theorem 2.23 from Sect. 2.6 can be formulated as follows: every 
permutation g of a finite set is the product of a finite number of transpositions, that 
is, 


g — L?i ,b\ L/2,^2 * * ‘^dk’bk • (13.4) 

As we saw in Sect. 2.6, in relationship (13.4), the number k and the choice of élé- 
ments a\, b\, . . . , ajc, bk for the given permutation g are not uniquely defined. This 
means that for a given permutation g, the représentation (13.4) is not unique. How- 
ever, as was proved in Sect. 2.6 (Theorem 2.25), the parity of the number ko fa 
permutation g is uniquely determined. Permutations for which the number k in the 
représentation (13.4) is even are called even, and those for which the number k is 
odd are called odd. 
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Example 13.6 The collection of ail even permutations of n éléments forms a sub- 
group of the symmetric group S n (it obviously satisfies conditions (a), (b), (c) in 
the définition of a subgroup). It is called the alternating group of degree n and is 
denoted by A n . 

Définition 13.7 Let g be an element of G. Then for every natural number n , the él- 
ément g n = g • • • g (n - fold product) is defined. For a négative integer ra, the element 
g ,u is equal to (g _1 ) _m , and for zéro, we hâve g° — e. 

It is easily verified that for arbitrary integers m and n, we hâve the relationship 

g m g n = gW+n. 


From this, it is clear that the collection of éléments of the form g'\ where n runs 
o ver the set of integers, forms a subgroup. It is called the cyclic subgroup generatecl 
by the element g and is denoted by {g}. 

There are two cases that can occur: 

(a) Ail the éléments g' 1 , as n runs through the set of integers, are distinct. In this 
case, we say that g is an element of infinité order in the group G. 

(b) For some integers m and n,m =fin, we hâve the equality g m — g n . Then, obvi- 
ously, g m ~ n = e. This means that there exists a natural number k (for instance 
\m—n\) such that g k — e. In this case, we say that g is an element of finit e order 
in the group G . 

If g is an element of finite order, then the smallest natural number k such that 
g k = e is called the order of the element g. If for some integer n, we hâve g 11 = e, 
then the number n is an integer multiple of the order k of the element g. Indeed, 
if such were not the case, then we could divide the number n by k with nonzero 
remainder: n = qk + r, where 0 < r < k. From the equalities g' 1 = e and g k = e , we 
could conclude that g r = e, in contradiction to the définition of the order k. If in the 
group G there exists an element g such that G = {g}, then the group G is called a 
cyclic group. It is obvious that if G = {g} and the element g has finite order k , then 
| G | — k. Indeed, in this case, e, g, g 2 , . . . , g k ~ { are ail the distinct éléments of the 
group G. 

Now we shall move on to discuss mappings of groups (homomorphisms), which 
play a rôle in group theory analogous to that of linear transformations of vector 
spaces in linear algebra. Let G and G ' be any two groups, and let e e G and e' g G' 
be their identity éléments. 

Définition 13.8 A mapping / : G — > G' is called a homomorphism if for every pair 
of éléments gi and g 2 of the group G, we hâve the relationship 

f(gi82) = f(gi)f(g2), (13.5) 

where it is obviously implied that on the left- and right-hand sides of equality (13.5), 
the juxtaposition of éléments indicates the multiplication operation in the respective 
group (on the left, in G; on the right, in G'). 
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From equality (13.5), it is easy to dérivé the simplest properties of homomor- 
phisms: 

1 . f {e) — e' \ 

2- /(g~ 1 ) = (/(g))~ 1 foreverygeG; 

3. f(g n ) = (/(g)) /7 for every g e G and every integer n. 

For the proof of the first property, let us set g \ — g 2 — e in formula (13.5). Then 
taking into account the equality e — ee, which is obvious from the définition of the 
identity element, we obtain that 

f(e) = f(ee) = f(e)f(e). 

It remains only to multiply both sides of the relationship f(e) — f(e)f(e) by the 
element ( of the group G', after which we obtain the required equality e' — 

f(e). The second property follows at once from the first: setting in (13.5) gi = g 
and g 2 = g -1 , and taking into account the equality e = gg _1 , we obtain 

e ' = f(e) = f(gg~ l ) = f(g)f(g~ 1 ), 

from which, by the définition of the inverse element, it follows that /(g -1 ) = 
(/(g)) -1 . Finally, the third property is obtained for positive n by induction from 
(13.5), and for négative n , it is also necessary to apply property 2. 

Définition 13.9 A mapping f : G —> G f is called an isomorphism if it is a homo- 
morphism that is also a bijection. Groups G and G' are said to be isomorphic is 
there exists an isomorphism / : G G' . This is denoted as follows: G — G'. 

Example 13.10 Assigning to each nonsingular linear transformation of a vector 
space L of dimension n its matrix (in some fixed basis of the space L), we obtain an 
isomorphism between the group of nonsingular linear transformations of this space 
and the group of nonsingular square matrices of order n . 

The notion of isomorphism plays the same rôle in group theory as the notion of 
isomorphism plays in the theory of vector spaces, and the notion of homomorphism 
plays the same rôle as the notion of arbitrary linear transformation (in vector spaces 
of arbitrary dimension). The analogy between these concepts is revealed particularly 
in the fact that the answer to the question whether a homomorphism f : G G' is 
an isomorphism can be formulated in terms of its image and kernel , just as was the 
case for linear mappings. 

The image of a homomorphism / is the set /(G), that is, simply the image of 
/ as a mapping of sets G —> G'. If follows from relationship (13.5) that /(G) is a 
subgroup of G' . The kernel of a homomorphism / is the set of éléments g G G such 
that /(g) = e' . It is likewise not difficult to conclude from (13.5) that the kernel is a 
subgroup of G . 

Using the notions of image and kernel, we may say that a homomorphism 
f : G G f is an isomorphism if and only if its image consists of the entire group 
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G ' and its kernel consists of only the identity element e e G. The proof of this 
assertion is based on relationship (13.5) and properties 1 and 2: if for two élé- 
ments g i and g 2 of a group G, we hâve the equality f(g\) — f(gi), then through 
right multiplying both sides by the element (/(g i)) -1 of the group G\ we obtain 
e' = — figigï 1 ), from which it follows that g 2 g^ 1 — e, that is, 

g\ = g2- 

It is important, however, to note that the analogy between isomorphisms of 
groups and isomorphisms of vector spaces does not extend ail that far: most of the 
theorems from Chap. 3 do not hâve suitable analogues for groups, even for finite 
groups. For example, one of the most important results of Chap. 3 (Theorem 3.64) 
States that ail vector spaces of a given finite dimension are isomorphic to one an- 
other. But there exist even finite groups of a given order that are not isomorphic; see 
Example 13.24 on p. 484. 

Another property of groups is related to whether the product of éléments in a 
group dépends on the order in which they are multiplied. In the définition of a group, 
no condition of this sort was imposed, and therefore, we may assume that in general, 
gig 2 ^ g 2 g\- Very frequently, such is the case. For example, nonsingular square 
matrices of a given order n with the standard operation of matrix multiplication 
form a group, and as the example presented in Sect. 2.9 on p. 64 shows, already for 
n — 2, it is generally the case that AB B A. 

Définition 13.11 If in a group G the equality g\g 2 = g 2 g\ holds for every pair of 
éléments g\,g 2 G G, then G is called a commutative group or, more usually, an 
abelicin groupe 

For example, the groups of integers, rational numbers, real numbers, and complex 
numbers with the operation of addition are ail abelian. Likewise, a vector space is 
an abelian group with respect to the operation of vector addition. It is easy to see 
that every cyclic group is abelian. 

Let us présent one resuit that holds for ail finite groups but that is especially easy 
to prove (and we shall use it frequently in the sequel) for abelian groups. 

Lemma 13.12 For every finite abelian group G, the order of each of its éléments 
divides the order of the group. 

Proof Let us dénoté by gi, g 2 , . . . , g n the complété set of éléments of G (so we 
obviously hâve n — |G|), and let us right multiply each of them by some element 
g G G. The éléments thus obtained, gig, g 2 g, • • • , g w g, will again ail be distinct. 
Indeed, given the equality g/ g = g /g, right multiplying both sides by g -1 yields the 
equality g/ — g j. Since the group G contains n éléments altogether, it follows that 
the éléments gig, g 2 g, • • • , gng are the same as the éléments gi, g 2 , . • . , g«, though 
perhaps arranged in some other order: 


glg gi\ ’ g2g gi 2 > •••» gng gi n • 


3 


Named in honor of the Norwegian mathematician Niels Henrik Abel (1802-1829). 
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On multiplying these equalities, we obtain 

(glg)(g2g) ' ■ ■ ( gng ) = g h g h ■ ‘ • gin ■ ( i 3 .6) 

Since the group G is abelian, we hâve 

(glg)(g2g) ' ■ ■ (gng) = glg2 ■ ■ ■ gng", 

and since gi x , g/ 2 , . . . , gi n are the same éléments gi , g 2 , • • • , gn, then setting h = 
gig 2 • • • gn, we obtain from (13.6) the equality hg n — h. Left multiplying both sides 
of the last equality by h~ [ , we obtain g n — e. As we saw above, it then follows that 
the order of the element g divides the number n=\G\. □ 

Définition 13.13 Let H\, H 2 , . . . , H r be subgroups of G. The group G is called 
the direct product of the subgroups H\, /G, . . . , H, if for ail éléments h\ g and 
h j g Hj from distinct subgroups, we hâve the relationship hjhj = h jhi, and every 
element g e G can be represented in the form 

g = h\h 2 ---h r , hi G H i , i = 1,2, . . . , r, 

and for each element g G G, such a représentation is unique. The fact that the group 
G is a direct product of subgroups H\, H 2 , . . . , H r is denoted by 

G — H\ x H 2 x x H, . (13.7) 

In the case of abelian groups, a different terminology is usually used, related to 
the majority of examples of interest. Namely, the operation defined on the group 
is called addition instead of multiplication, and it is denoted not by gig 2 , but by 
gl + g 2 - In keeping with this notation, the identity element is called the zéro element 
and is denoted by 0, and not by e. The inverse element is called the négative or 
additive inverse and is denoted not by g -1 , but by —g, and the exponential notation 
g n is replaced by the multiplicative notation ng, which is defined similarly: ng = 

g H h g (n-fold sum) if n > 0, by ng = (—g) H h (—g) (ft-fold sum) if n < 0, 

and by ng = 0 if n = 0. The définition of homomorphism remains exactly the same 
in this case, where it is required only to replace in formula (13.5) the Symbol for the 
group operation: 

/(gl + g2) = /(gl) + /(g2). 

Properties 1-3 here take the following form: 

1. /( 0) = 0'; 

2. /(-g) = -/(g) for ail g g G; 

3. f(ng) — nf(g) for ail g g G and for every integer n. 

This terminology agréés with the example of the set of integers and, in the termi- 
nology we employed earlier, the example of vectors that form an abelian group with 
respect to the operation of addition. 
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In the case of abelian groups (with the operation of addition), instead of the 
direct product of subgroups H\, H 2 , . . . , H r one speaks of their direct sum. Then 
the définition of the direct sum reduces to the condition that every element g g G 
can be represented in the form 

g = h 1 + /î2 + • • • + h r , hi G H [ , i = 1 , 2, . . . , r, 

and that for each element g G G, the représentation is unique. It is obvious that this 
last requirement is équivalent to the requirement that the equality h\ + fi 2 + • • • + 
h r — 0 be possible only if h\ = 0, /i 2 = 0, . . . , h r = 0. That a group G is the direct 
sum of subgroups H \ , H 2 , . . . , H, is denoted by 


G = H { 0tf 2 ©•••©//,-. (13.8) 

It is obvious that in both cases (13.7) and (13.8), the order of the group G is equal 
to 


I G | = | H\ | • | H 2 | • • • | H r | . 

In perfect analogy to how things were done in Sect. 3.1 for vector spaces, we may 
define the direct product (or direct sum) of groups that in general are not originally 
the subgroups of any particular group and that even, perhaps, are of completely 
different natures from one another. 

Example 13.14 If we map every orthogonal transformation VL of a Euclidean space 
to its déterminant | VL | , which, as we know, is equal to + 1 or — 1 , we obtain a ho- 
momorphism of the group of orthogonal transformations into the symmetric group 
S 2 of order 2. If we map every Lorentz transformation VL of a pseudo-Euclidean 
space to the pair of numbers £(T() = (|T(|, v(Tt)), defined in Sect. 7.8, we obtain a 
homomorphism of the group of Lorentz transformations into the group S 2 x S 2 . 

Example 13.15 Let (V, L) be an affine Euclidean space of dimension n and G the 
group of its motions. Then the assertion of Theorem 8.37 can be formulated as the 
equality G — T n x O n , where T n is the group of translations of the space V, and O n 
is the group of orthogonal transformations of the space L. Let us note that T n — L, 
where L is understood as a group under the operation of vector addition. Indeed, let 
us define the mapping / : T n —> L that to each translation T a by the vector a assigns 
this vector a. Obviously, the mapping / is bijective, and by virtue of the property 
T a Tb — T a +b , it is an isomorphism. Thus Theorem 8.37 can be formulated as the 
relationship G~Lx O n . 


13.2 Décomposition of Finite Abelian Groups 

Later in this chapter we shall restrict our attention to the study of finite groups. 
The highest goal in this area of group theory is to find a construction that gives a 
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description of ail finite groups. But such a goal is far from accessible; at least at 
présent, we are far from attaining it. However, for finite abelian groups, the answer 
to this question turns out to be unexpectedly simple. Moreover, both the answer and 
its proof are very similar to Theorem 5.12 on the décomposition of a vector space 
as a direct sum of cyclic subspaces. For the proof, we shall require the following 
lemmas. 

Lemma 13.16 Let B be a subgroup of A, and a an element of the group A of 
order k. If there exists a number m e N relatively prime to k such that ma G B , then 
a is an element of B . 

Proof Since the numbers m and k are relatively prime, there exist integers r and s 
such that kr + ms = 1. Multiplying ma by s and adding kra to the resuit (which is 
equal to zéro, since k is the order of the element a ), we obtain a. But s ma = s {ma) 
belongs to the subgroup B. From this, it follows that a is also an element of B. □ 

Lemma 13.17 If A = {a} is a cyclic group of order n, and we set b = ma , where 
me N is relatively prime to n, then the cyclic subgroup B — {b} generated by the 
element b coïncides with A. 

Proof Since a e A, we hâve by Lemma 13.12 that the order k of the element a 
divides the order of the group A, which is equal to n, and the relative primality 
of the numbers m and n implies the relative primality of the numbers k and m. 
From Lemma 13.16, it follows that a e B, which means that A C B, and since we 
obviously hâve also B C A, we obtain the required equality B — A. □ 

Corollary 13.18 Under the assumptions of Lemma 13.17, every element c e A can 
be expressed in the form 


c — md , d e A, m e Z. (13.9) 

Indeed, if in the notation of Lemma 13.17, the group A is the group {b}, then the 
element c has the form kb, and since è = ma, we obtain equality (13.9) in which 
d — ka. 

Définition 13.19 A subgroup B of a group A is said to be maximal if B A and B 
is contained in no subgroup other than A . 

It is obvious that there exist maximal subgroups in every finite group that consists 
of more than just a single element. Indeed, beginning with the identity subgroup 
(that is, the subgroup consisting of a single element), we can include it, if it is 
not itself maximal, in some subgroup B\ different from A. If in B\ we hâve not 
yet obtained a maximal subgroup, then we can include it in some subgroup B 2 
different from A. Continuing this process, we eventually can go no further, since 
ail the subgroups B\, B 2 , . . . are contained in the finite group A. The last subgroup 
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obtained when we stop the process will be maximal. We remark that we do not assert 
(nor is it true) that the maximal subgroup we hâve constructed is unique. 

Lemma 13.20 For every maximal subgroup B of a finite abelian group A, there 
exists an element a e A not belonging to B such that the smallest number m G N for 
which ma belongs to B is prime , and every element x e A can be represented in the 
form 

x — ka -\- b , ( 13 . 10 ) 


for k an integer, b e B. 

Later, we shall dénoté the prime number m that appears in Lemma 13.20 by p. 

P roof of Lemma 13.20 Let us take as a any element of the group A not belonging 
to the subgroup B. The collection of ail éléments of the form ka + b, where k is 
an arbitrary integer and b an arbitrary element of B , obviously forms a subgroup 
containing B (it is easy to see that B consists of éléments x such that in the repré- 
sentation x = ka + b, the number k is equal to 0). It is obvious that this subgroup 
does not coincide with B , since it contains the element a (for k — 1 and b — 0 ), and 
this means, in view of the maximality of the subgroup B , that it coincides with A. 
From this follows the représentation (13.10) for every element * in the group A. 

It remains to prove that for some prime number p, the element pa belongs to B. 
Since the element a is of finite order, we must hâve na — 0 for some n > 0 . In 
particular, na e B. Let us take the smallest m g N for which ma g B and prove that 
it is prime. 

Suppose that such is not the case, and that p is a prime divisor of m. Then m — 
pm\ for some integer m\ < m. Let us set a\ = m\a. As we hâve seen, the collection 
of ail éléments of the form ka\ -h b (for arbitrary integer k and b e B) forms a 
subgroup of the group A containing B. If the element a\ were contained in B , 
then that would contradict the choice of m as the smallest natural number such that 
ma G B. This means that a\ £ B, and in view of the maximality of the subgroup B, 
the subgroup that we constructed of éléments of the form kay + b coincides with A. 
In particular, it contains the element a , that is, a = ka\ + b for some k and b. From 
this, it follows that pa = kpa\ -h pb. But pa\ — pm\a — ma G B , and since pb G B , 
this means that pa e B, which contradicts the minimality of m. This means that the 
assumption that m has prime divisors less than m is false, and so m — p is a prime 
number. □ 

Remark 13.21 We chose as a an arbitrary element of the group A not contained 
in B. In particular, in place of a, we could as well choose any element a' = a + b, 
where b e B. Indeed, from a — a' — b and a' e B it would follow that we would 
also hâve a e B. 


We can now State the fundamental theorem of abelian groups. 
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Theorem 13.22 Every finite abelian group is the direct sum of cyclic subgroups 
whose orders are equal to powers of prime numbers. 

Thus, the theorem asserts that every finite abelian group A has the décomposition 

A = A\ © • • • © A r , (13.11) 

where the subgroups A,- are cyclic, that is, A/ = {«/}, and their orders are powers of 
prime numbers, that is, | A/ 1 = p i 1 , where p\ are prime numbers. 

P roof of Theorem 13.22 Our proof is by induction on the order of the group A. For 
the group of order 1, the theorem is obvious. Therefore, to prove the theorem for a 
group A, we may assume that it has been proved for ail subgroups B c A, B A, 
since for an arbitrary subset B C A with B A, the number of éléments of B is less 
than | A | . 

In particular, let B be a maximal subgroup of the group A. B y the induction 
hypothesis, the theorem is valid for this subgroup, and it therefore has the décom- 
position 

B — C\ © • • • © C, , (13.12) 

in which the C/ are cyclic subgroups each of which has order the power of a prime 
number: 

Q = [ah p?ci = o. 

Lemma 13.20 holds for the subgroup B\ let a g A, a £ B, be the element provided 
for in the formulation of this lemma. By hypothesis, every element x e B can be 
represented in the form 

x — k\c\ H h k r c r . 

In particular, this holds for the element b — pa (in the notation of Lemma 13.20): 


pa — k\c\ H + k r c r . 

Let us select the terms k t Ci in this décomposition that can be written in the form 
pdj , where J/ g Ci. These are first of ail, the terms k[Ci for i such that pi p. 
This follows from Corollary 13.18. Moreover, ail éléments of the form kiCi possess 
this property if p { — p and ki is divisible by p. Let the chosen éléments be ki ci , 
i = 1 , . . . , s — 1 . Then for the remaining éléments kjCj , i = s, . . . , r, we hâve pi — p 
and ki is not divisible by p. Setting 

kiCi = pdj , di G C/, i = 1, . . . , s — 1, d\ + \-d s -\—d , (13.13) 

we obtain 


pa — pd + k s c s H + k r c r . 
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We can now use the freedom in the choice of the element a e A, which was men- 
tioned in Remark 13.21, and take instead of a , the element a' = a — d, silice d e B 
in view of formula (13.13). We then hâve 

pa — k s c s H h k r c r . (13.14) 

There are now two possible cases. 

Case 1. The number s — l is equal to r, and then equality (13.14) gives 



In this case, the group A décomposés as a direct sum of cyclic subgroups as follows: 


A — C\ ® • • • ® C r ® CV+i , 


where C r + 1 = {^ r } is a subgroup of order p. 

Indeed, Lemma 13.20 asserts that every element x e A can be represented in the 
form ka' + b , and since in view of (13.12), the element b can be represented in the 
form 


b — k[C\ + • • • + k r c r , 

it follows that x has the form 

x — k\c\ H f k r c r T ka . (13.15) 

This proves the first condition in the définition of a direct sum. 

Let us prove the uniqueness of représentation (13.15). For this, it suffices to prove 
that the equality 

k\c\ H h k r Cf -f ka — 0 (13.16) 

is possible only for k\c\ = • • • = k r c r = ka r — 0. Let us rewrite (13.16) in the form 

ka' = —k\c\ — k r c r . (13.17) 

This means that the element ka' belongs to B. If the number k were not divisible by 
/?, then k and p would be relatively prime, since the element a' has order /?, and by 
Lemma 13.16, we would then obtain that a' e B. But this contradicts the choice of 
the element a and the construction of the element a ' . This means that p must divide 
k , and since pa' = 0, it follows that we also hâve ka' — 0. Thus equality (13.17) is 
reduced to k\c\ + • • • + k,-c r = 0, and from the fact that the group B is the direct 
sum of subgroups Ci, . . . , C r , we obtain that k\c\ = 0, . . . , k r c r — 0. 

Case 2. The number ^ — 1 is less than r. Let us set k s c s — d s , . . . , k r c r = d r , and 
for i — l, ... ,s — 1, let us set c/ = d(. By Lemma 13.17, the element d[ generates 
the same cyclic subgroup C/ as c/ . For i < s — 1 , this assertion is a tautology, and 
for i > s — 1, it follows from the fact that the numbers kj are by assumption not 
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divisible by p , and p m ‘Ci — 0 for ail i > s. Equality (13.14) can then be rewritten as 
follows: 

pci = d$ “H • • • © df . (13.18) 

Let m s < • • • < m r . Let us dénoté by C' r the cyclic group generated by the element 
a', that is, let us set C' r = [a'}. Let us prove that the order of the element ci', and 
therefore the order of the group C' r , is equal to /? mr+1 : 


Cl 


= p m ' + 1 


(13.19) 


Indeed, in view of (13.18), we hâve 

p m ' +1 a' = p mr d s + • • • + p m 'd r = 0, 


since p m ‘ di = 0, m, < m r . On the other hand, in view of relationship (13.18), we 
hâve 

p nir a f = p mr ~ l d s H h p mr ~ l d r 7^ 0, 

since p m, '~ l d r ^ 0, and in view of (13.12), the sum of the éléments p nir ~ [ dj e Ci 
cannot equal 0 if at least one term is not equal to 0. This proves (13.19). 

Now let us prove that 


A — C i © • • • © C r — i © C' r , (13.20) 

that is, that every element x e A can be uniquely represented in the form 

x ~y i+ \-y r ~\+y' r , y\ e C\, . . . , y r -\ e C r -\, y' r e C' r . (13.21) 

First let us prove the possibility of représentation (13.21). Since every element 
x e A can be represented in the form ka' + b,b e B, it suffices to prove that it is pos- 
sible to represent separately a' and an arbitrary element b e B in the form (13.21). 
This is obvious for an element a', since it belongs to the cyclic group C' r = {a'}. As 
for éléments of B, each b e B can be represented in the form 


b — k\d\ © • • • © k r d r . 


according to formula (13.12) and in view of the fact that Ci = {dj}. Therefore, it 
suffices to prove that each of the éléments di can be represented in the form (13.21). 
For d\, ... , d r - 1 , this is obvious, since 


di e Ci = {di}, i = 1 , . . . , r — 1 . 


Finally, in view of (13.18), we hâve 

d r — — d s — • • • — d r — i © pci , 

and this is the représentation of the element d r that we need. 
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Let us now prove the uniqueness of représentation (13.21). For this, it suffices to 
prove that the equality 


k\d\ + • • • + k r —\d r —\ + k r ci — 0 (13.22) 

is possible only for k\d\ = • • • = k r a' = 0. Let us suppose that k r is relatively prime 
to p. Then 

k r a — —k\d\ — • • • — k r -\d r -\, 

and in view of the fact that p ,Tlr+{ a f = 0, we obtain by Lemma 13.16 that a' e B. 
But the element a e A was chosen as an element not belonging to the subgroup B. 
This means that the element a' also does not belong to B. 

Let us now consider the case in which the number k r is divisible by p. Let k r — 
pi. Then 

pl a — —k[d\ — ... — k r -\d r -\. 

Let us replace pa' on the left-hand side of this relationship by the expression d s + 
• • • + d r on the basis of equality (13.18). On transferring ail terms to the left-hand 
side, we obtain 


ld s H h Idf T k\d\ H- • • • H- k r —\d r —\ — 0. 


From the fact that by hypothesis, the group B is the direct sum of groups Ci, . . . , C r , 
it follows that in this equality, ld r = 0. Since the order of the element d r is equal 
to p mr , this is possible only if p nîr divides /, and this means that p mr+1 divides k r . 
But we hâve seen that the order of the element a ' is equal to /? mr+1 , and this means 

that k r a' — 0. Then it follows from equality (13.22) that k\d\ H \-k r -\d r -\ = 0. 

And since by the induction hypothesis, the group B is the direct sum of the groups 
Ci, . . . , C r , it follows that k\ d\ = • • • = k r -\d r -\ — 0. This complétés the proof of 
the theorem. □ 


13.3 The Uniqueness of the Décomposition 

The theorem on the uniqueness of the Jordan normal form has an analogue in the 
theory of finite abelian groups. 

Theorem 13.23 For different décompositions of the finite cibelian group A into a 
direct sum of cy clic subgroups whose orders are prime powers , whose existence is 
established in Theorem 13.22, 

A — A i © • • • © A r , | Ai | = p- 1 , 

the orders p i 1 ofthe cyclic subgroups Ai are unique. In other words , if 

A = A j © • • • © A ç 


(13.23) 
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is another such décomposition , then s — r, and the subgroups A J can be reordered 
in such a way that the equality \ \ — | A; | is satisfed for ail i — 1, . . . , r. 


Proof We shall show how the orders of the cyclic subgroups in the décomposition 
(13.23) are uniquely determined by the group A itself. For any natural number k , let 
us dénoté by kA the collection of éléments a of the group A that can be represented 
in the form a — kb, where b is some element of this group. It is obvious that the 
collection of éléments kA forms a subgroup of the group A. Let us prove that the 
orders \kA\ of these subgroups (for various k) détermine the orders of the cyclic 
groups |A/| in the décomposition (13.23). 

Let us consider an arbitrary prime number p and analyze the case that k is a 
power of a prime number p , that is, k — //. Let us factor the order \p' A\ of the 
group p 1 A into a product of a power of p and numbers ni relatively prime to p: 


p'A 


= P n ni, 


(«/, p) = 1. 


(13.24) 


On the other hand, for a prime number p , let us dénoté by // the number of subgroups 
Ai of order p' appearing in the décomposition (13.23). We shall présent an explicit 
formula that expresses the numbers // in terms of r \ . Silice these latter numbers are 
determined only by the group A, it follows that the numbers /, also do not dépend 
on the décomposition (13.23) (in particular, they are equal to zéro if and only if ail 
prime numbers pi for which | A,- 1 = p™ 1 differ from p). 

First of ail, let us calculate the order of the group A in another way. Let us note 
that A = p°A, so that this is the case i — 0. The définition of the number // shows 
that in the décomposition (13.23), we hâve l\ groups of order /?, I 2 groups of order 
p 2 , . . . , and the remaining groups hâve orders relatively prime to p. Hence it follows 
that 

\A\ = p h p 2h ■■ -n 0 , (n Q ,p) = 1. 

Let us set 


\A\ = p r °no, (no,p) = l. 

Then we can write the relationship above in the form 

l\ + 2/2 + 3/3 + • • • = ro. (13.25) 

Now let us consider the case that k — p 1 > 1, that is, the number i is greater 
than 0. First of ail, it is obvious that for every natural number k , it follows from 
(13.23) that 

kA — kA\ © kA2 © • • • © kA r . 

It is obvious that ail properties of a direct sum are satisfied. 

Now, as in the case examined above, let us calculate the order of the group p 1 A 
in another way. It is obvious that | p 1 A \ — \p l A\ | • • • \p l A r |. If for some j , we hâve 
| A. j | = pj J and pj ^ p , then Lemma 13.17 shows that p 1 A j — Aj , and we hâve 
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\p l A j \ — \Aj \ — p j J , which is relatively prime to p. Thus in the décomposition 

\p l A\ — \p l A \ | • • • \p l A r |, ail the factors \p l Aj\, where \Aj\ — p. J and pj ^ /?, 
together give a number that is relatively prime to p, and in formula (13.24), they 
make no contribution to the number r, . It remains to consider the case pj — p. Silice 
A j is a cyclic group, it follows that A y = {cij}. It is then clear that p 1 A y = {p'aj}. 
Let us find the order of the element p 1 a j. Since p m uij — 0, we hâve p m J~ l {p 1 a j) — 
0 if i <mj, and p' a -, = 0 if i — mj. 

Let us prove that p m i~ l is precisely the same as the order of the element p 1 a p 
Let this order be equal to some number s. Then s must divide p m J ~ l , which means 
that it is of the form p r . If t < m y — /, then the equality p 1 (p 1 cij) — 0 would show 
that p t+, aj = 0, that is, that the element aj had order less than p m J . This means 
that \p l A j \ — p m J~ l for i < ray. The fact that p 1 A j — 0 for i > ra y (which means 
that \p l A j \ — 1) is obvious. 

We can now literally repeat the argument that we used earlier. We see that in the 
décomposition 

p 1 A — p' Ai ® p' A 2 ® ■ ■ • ® p 1 A r , 


subgroups of order p occur when m j — i = 1, that is, m y = i + 1, and this means that 
in our adopted notation, they occur /,•+ 1 times. Likewise, the subgroups of order p 2 
occur when mj = i + 2, that is, /;+2 times, and so on. Moreover, certain subgroups 
will hâve order relatively prime to p. This means that 


P 1 A 


— ph + 1 p 2/ /+ 2 


n 


i » 


where (ni, p) = 1. 


In other words, in accordance with our previous notation, we hâve 


/f+i H - 2Z/+2 + • • • — n . (13.26) 

In particular, formula (13.25) is obtained from (13.26) for i = 0. 

If we now subtract from each formula (13.26) the following one, we obtain that 
for ail i — 1 , 2, . . . , we hâve the equalities 


li + U+\ + * • • — ri - 1 — ri. 


Repeating the same process, we obtain 


li =n - 1 -2 n +r,-+ 1. 
These relationships prove Theorem 13.23. 



Theorems 13.22 and 13.23 make it easy to give the number of distinct (up to 
isomorphism) finite abelian groups of a given order. 


Example 13.24 Suppose, for example, that we would like to détermine the number 
of distinct abelian groups of order p 2 q 2 , where p and q are distinct prime numbers. 
Theorem 13.22 shows that such a group can be represented in the form 


A — C \ ® • • • © C s , 
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where C, are cyclic groups whose orders are prime powers. From this décomposi- 
tion, it follows that 

|A| = |C 1 |---|C J |. 

In other words, among the groups C/, there is either one cyclic group of order /? 3 , or 
one of order p 2 and one of order p, or three of order p. And likewise, there is one 
of order q 2 or two of order q. Combining ail these possibilities (three for groups 
of order p' and two for groups of order q J ), we obtain six variants. Theorem 13.23 
guarantees that of the six groups thus obtained, none is isomorphic to any of the 
others. 


13.4 Finitely Generated Torsion Modules over a Euclidean Ring* 

The proofs of the theorem on finite abelian groups and the theorem on Jordan nor- 
mal form (just like the proofs of the corresponding uniqueness theorems) are so 
obviously parallel to each other that they surely are spécial cases of some more 
general theorems. This is indeed the case, and the main goal of this chapter is the 
proof of these general theorems. For this, we shall need two abstract (that is, defined 
axiomatically) notions. 

Définition 13.25 A ring is a set R on which are defined two operations (that is, two 
mappings R x R —> R), one of which is called addition (for which an element that 
is the image of two éléments a g R and b g R is called their sum and is denoted by 
a + b ), and the second of which is multiplication (the element that is the image of 
a G R and b G R is called their product and is denoted by a b). For these operations 
of addition and multiplication, the following conditions must be satisfied: 

(1) With respect to the operation of addition, the ring is an abelian group (the iden- 
tity element is denoted by 0). 

(2) For ail a, b, c g R, we hâve 

a(b + c) — ab + ac, (b + c)a = ba + ca. 

(3) For ail a,b,c e R, the associative property holds: 

a(bc) = (i ab)c . 

In the sequel, we shall dénoté a ring by the letter R and assume that it has a 
multiplicative identity, that is, that it contains an element, which we shall dénoté by 
1 , satisfying the condition 

a • 1 = 1 • a = a for ail a g R . 

In this chapter, we shall be considering only commutative rings, that is, it will be 
assumed that 


ab — ba for ail a, b G R. 
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We hâve already encountered the most important spécial case of a ring, namely 
an algebra, in connection with the construction of the exterior algebra of a vector 
space, in Chap. 10. Let us recall that an algebra is a ring that is a vector space, where, 
of course, consistency of the notions entering into these définitions is assumed. This 
means that for every scalar a (in the field over which the vector space in question is 
defined) and for ail éléments a, b of the ring R , we hâve the equality ( aa)b — a(ab). 
On the other hand, we are quite familiar with an example of a ring that is not an 
algebra in any natural sense, namely the ring of integers Z with the usual arithmetic 
operations of addition and multiplication. 

Let us note a connection among the concepts we hâve introduced. If ail nonzero 
éléments of a commutative ring form a group with respect to the operation of mul- 
tiplication, then such a ring is called a field . We assume that the reader is familiar 
with the simplest properties of fields and rings. 

The concept that generalizes both the concept of vector space (over some field 
K) with a linear transformation given on it and that of an abelian group is that of a 
module. 

Définition 13.26 An abelian group M (its operation is written as addition) is a 
module M over a ring R if there is defined an additional operation of multiplication 
of the éléments of the ring R by éléments of the module M that produces éléments 
of the module that hâve the following properties: 

a (m + n) = am + an , 

(a + b)m — am + bm, 

(ab)m — a(bm), 

1 m — m , 

for ail éléments a,b e R and ail éléments m,n g M. 

For convenience, we shall dénoté the éléments of the ring using ordinary letters 
a,b, ... , and éléments of the module using boldface letters: m,n, 

Example 13.27 An example of a module that we hâve encountered repeatedly is 
that of a vector space over an arbitrary field K (here the ring R is the field K). On 
the other hand, every abelian group G is a module over the ring of integers Z: the 
operation defined on it of intégral multiplication kg for k g Z and g g G obviously 
possesses ail the required properties. 

Example 13.28 Let L be a vector space (real, complex, or over an arbitrary field K) 
and let A : L —> L be a fixed linear transformation. Then we may consider L as a 
module over the ring R of polynomials in the single variable x (real, complex, or 
over a field K), assuming, as we did earlier, for a polynomial f(x) g R and vector 
£ g L, 


f(x)e=f(A)(e). 


(13.27) 
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It is easily verified that ail the properties appearing in the définition of a module are 
satisfied. 

Our immédiate objective will be to find a restriction of the general notion of 
module that covers vector spaces and abelian groups and then to prove theorems for 
these that generalize Theorems 5.12 and 13.22. 

These two examples — the ring of integers Z and the ring of polynomials in a 
single complex variable (for simplicity, we shall restrict our attention to the spécial 
case K = C, but many results are valid in the general case) — hâve many similar 
properties, the most important of which is the uniqueness of the décomposition into 
irreducible factors, that is, prime numbers in the case of the ring of integers, and 
linear polynomials in the case of the ring of polynomials with complex coefficients. 
Both of these properties, in turn, dérivé from a single property: the possibility of 
division with remainder, which we shall introduce in the définition of certain rings 
for which it is possible to generalize the reasoning from previous sections. 

Définition 13.29 A ring R is called a Euclidean ring if 

ab ^ 0 for ail a ,b e R , a ^ 0 and b ^ 0, 

and for nonzero éléments a of the ring, a function <p(a) is defined taking nonnegative 
integer values and exhibiting the following properties: 

(1) (p(ab) > (p(a ) for ail éléments a, b e R, a ^0, b ^0. 

(2) For ail éléments a, b e R, where a ^ 0, there exist q, r G R such that 

b — aq-\- r (13.28) 


and either r — 0 or (p(r) < cp (a). 

For the ring of integers, these properties are satisfied for <p(a) = \a\, while for 
the ring of polynomials, they are satisfied for cp(a) equal to the degree of the poly- 
nomial a. 

Définition 13.30 An element a of a ring R is called a unit or réversible element if 
there exists an element b e R such that ab — 1. An element b is called a divisor of 
the element a (one also says that a is divisible by b or that b divides a) if there exists 
an element c such that a — bc. 

Clearly the property of divisibility is unchanged under multiplication of a or b 
by a unit. Two éléments that differ by a unit are called associâtes. For example, 
in the ring of integers, the units are +1 and —1, and associâtes are integers that 
are either equal or differ by a sign. In the ring of polynomials, the units are the 
constant polynomials other than the one that is identically zéro, and associâtes are 
polynomials that differ from each other by a constant nonzero multiple. 

An element p of a ring is prime if it is not a unit and has no divisors other than 
its associâtes and units. 
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The theory of décomposition into prime factors in a Euclidean ring repeats ex- 
actly what is known for the ring of integers. 

If an element a is not prime, then it has a divisor b such that a = bc , with c not a 
unit. This means that a is not a divisor of b , and there exists the représentation b — 
aq + r with (p{r) < <p(a). But r — b — aq — b{ 1 — cq ), and therefore cp(r) > cp(b), 
that is, cp(b ) < (p(r) < <p(a), which means that cp(b) < cp(a). Applying the same 
reasoning to b , we finally arrive at a prime divisor a , and we shall show that every 
element can be represented as the product of primes. The same argument as used in 
the case of integers or polynomials shows the uniqueness of this décomposition in 
the following précisé sense. 

Theorem 13.31 If some element a in a Euclidean ring R has two facto rizations 
into prime factors, 


a = P\---Pr, a = qi---q s , 

then r — s, and with a suitable numération of the factors , pi and q\ are associâtes 
for ail i . 

As in the ring of integers, in every Euclidean ring, each element a 0 that is not 
a unit can be written in the form 


n i n r 

a = up x •••/?/, 


where u is a unit, ail the pi are prime éléments with no two of them associâtes, and 
ni are natural numbers. Such a représentation is unique in a natural sense. 

As in the ring of integers or of polynomials in one variable, représentation (13.28) 
for r 0 can be applied to éléments b and r and repeated until we arrive at r = 0 . 
We will thus obtain a greatest common divisor (gcd) of the éléments a and b , that 
is, a common divisor such that every other common divisor is a divisor of it. The 
greatest common divisor of a and b is denoted by d — {a, b) or d — gcd (a, b). This 
process, as it is for integers, is called the Euclidean algorithm (whence the name 
Euclidean ring). It follows from the Euclidean algorithm that a greatest common 
divisor of éléments a and b can be written in the form d — ax + by, where x and y 
are some éléments of the ring R . 

Two éléments a and b are said to be relatively prime if their only common di- 
visors are units. Then we may consider that gcd {a, b) — 1, and as follows from the 
Euclidean algorithm, there exist éléments x, y e R such that 


ax + by = 1. (13.29) 

Let us now recall that the theorem on Jordan normal form holds in the case 
of finite-dimensional vector spaces, and that the fundamental theorem of abelian 
groups holds for finit e abelian groups. Let us now dérivé analogous finiteness con- 
ditions for modules. 
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Définition 13.32 A module M is said to be finitely generated if it contains a fi- 
nite collection of éléments m i, . . . , m r , called generators, such that every element 
m g M can be expressed in the form 

m = a\m\-\ Y a r m r (13.30) 

for some éléments a\, ... , a r of the ring R. 

For a vector space considered as a module over a certain field, this is the déf- 
inition of finite dimensionality, and représentation (13.30) is a représentation of a 
vector m in the form of a linear combination of vectors m\ , . . . , m r (let us note that 
the System of vectors m\, ... ,m r will in general not be a basis, since we did not 
introduce the concept of linear independence). In the case of a finite abelian group, 
we may generally take for m\ , . . . , m r , ail the éléments of the group. 

Let us formulate one additional condition of the same type. 

Définition 13.33 An element m of a module M over a ring R is said to be a torsion 
element if there exists an element a m ^ 0 of the ring R such that 

o m in — 0, 

where 0 is the null element of the module M, and the subscript in a m is introduced 
to show that this element dépends on m. A module is called a torsion module if ail 
of its éléments are torsion éléments. 

In a finitely generated torsion module, there is an element a / 0 of the ring R 
such that am — 0 for ail éléments m e M . Indeed, it suffices to set a = a mi • • • a nlr 
for the éléments mi,..., m, in représentation (13.30). If the ring R is Euclidean, 
then we can conclude that a ^ 0. For the case of a finite abelian group, we may take 
a to be the order of the group. 

Example 13.34 Let M be a module determined by a vector space L of dimension 
n and by a linear transformation A according to formula (13.27). For an arbitrary 
vector e g L, let us consider the vectors 

e , A(e), ..., A n (e). 

Their number, n + 1 , is greater than the dimension n of the space L, and therefore, 
these vectors are linearly dépendent, which means that there exists a polynomial 
f(x), not identically zéro, such that f(A)(e) = 0, that is, in our module M, the 
element e is a torsion element. 

But if, as we did in Example 13.27, we view a vector space as a module over 
the field R or C, then not a single nonnull vector will be a torsion element of the 
module. 

Let M be a module over a ring R. A subgroup M' of the group M is called a 
submodule if for ail éléments a e R and m f e M' , we hâve am r g M' . 
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Example 13.35 It is obvious that every subgroup of an abelian group viewed as a 
module over the ring of integers is a submodule. Analogously, for a vector space 
viewed as a module over a ring coinciding with a suitable field, every subspace is a 
submodule. If M is a module defined by a vector space L and a linear transformation 
A of L according to formula (13.27), then as is easily verified, every submodule of 
M is a vector subspace that is invariant with respect to the transformation A. 

If M' C M is a submodule, and m is any element of the module M, then it is 
easily verified that the collection of ail éléments of the form am + m' , where a is 
an arbitrary element of the ring R , and m' is an arbitrary element of the submodule 
M', is a submodule. We shall dénoté it by ( m , M'). 

Since we are assuming that the ring R is Euclidean, it follows that for every 
torsion element m e M, there exists an element a e R that exhibits the property 
am = 0 and is such that (p{a) is the smallest value among ail éléments with this 
property. Then every element c for which cm — 0 is divisible by a. Indeed, if such 
were not the case, we would hâve the relationship 

c — aq + r, (p(r)<(p{a ), 

and clearly rm — 0, which contradicts the définition of a. In particular, two such 
éléments a and a' divide each other; that is, they are associâtes. The element a e R 
is called the order of the element m e M. One must keep in mind that this expression 
is not quite précisé, since order is defined only up to associâtes. 

Example 13.36 If, as in Example 13.28, a module is a vector space L viewed as a 
module over the polynomial ring f(x) with the aid of formula (13.27), then every 
element e e L is a torsion element, and its order is the same as the minimal polyno- 
mial of the vector e (see the définition on p. 146), and the indicated property (every 
element c for which cm = 0 is divisible by the order of the element m) coincides 
with Theorem 4.23. 

Définition 13.37 A submodule M' of a module M is said to be cyclic if it contains 
an element m' such that ail the éléments of the module M ' can be represented in the 
form am' with some a e R. This is written M' — {m'}. 

Définition 13.38 A module M is called the direct sum of its submodules M \ , . . . , 
M r if every element m e M can be written as a sum 

m—m\-\ +m r , mjeMi, 

and such a représentation is unique. It is obvious that to establish the uniqueness of 
this décomposition, it suffices to prove that if m i + • • • + m r = 0, tti, g M/, then 
/zi, = 0 for ail i. This can be written as the equality 


M = Mi ® -®M r . 
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The fondamental theorem that we shall prove, which contains Theorem 5.12 on 
the Jordan normal form and Theorem 13.22 on finite abelian groups as spécial cases, 
is the following. 

Theorem 13.39 Every finitely generated torsion module M over a Euclidean ring 
R is the direct sum of cy clic submodules 

M — Ci © • • • © C r , Ci — {mi }, (13.31) 

such that the order ofeach element m\ is a power ofa prime element ofthe ring R. 

Example 13.40 If M is a finite abelian group viewed as a module over the ring 
of integers, then this theorem reduces directly to the fondamental theorem of finite 
abelian groups (Theorem 13.22). 

Let the module M be determined by the finite-dimensional complex vector space 
L and the linear transformation A of L according to formula (13.27). Then the Ci 
are vector subspaces invariant with respect to A, and in each of these, there exists a 
vector mj such that ail the remaining vectors can be written in the form f(A)(m,). 
The prime éléments in the ring of complex polynomials are the polynomials of the 
form x — À. By assumption, for each vector nif, there exist some À, and a natural 
number n { such that 

( e A-A. / g)" i (iWi) = 0. 

If we take the smallest possible value ni, then as proved in Sect. 5.1, the vectors 
mi, ( A-Xi8)(mi ), (A - 

will form a basis of this subspace, that is, Ci is a cyclic subspace corresponding to 
the principal vector m/. We obtain the fondamental theorem on Jordan form (Theo- 
rem 5.12). 

Let us recall that we proved Theorem 5.12 by induction on the dimension of the 
space. More precisely, for a linear transformation A on the space L, we constructed 
a subspace L' invariant with respect to A of dimension 1 less and proved the theorem 
for L on the assumption that it had been proved already for L/. In fact, this meant 
that we constructed a sequence of nested subspaces 


L = Lo D Li D L .2 D • • O L n D U+l = (0), (13.32) 

invariant with respect to A and such that diml_/+i = dimL/ — 1. Then we reduced 
the proof of Theorem 5.12 for L to the proof of the theorem for Li, then for l_ 2 , 
and so on. Now our first goal will be to construct in every finitely generated torsion 
module a sequence of submodules analogous to the sequence of subspaces (13.32). 

Lemma 13.41 In every finitely generated torsion module M over a Euclidean ring 
R , there exists a sequence of submodules 


M = M 0 © Mi © M 2 D • • • D M n O M„+i = {0} 


(13.33) 
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such thaï Mi M/+ 1, M\ — (wi/, M/+ 1), where m { are éléments ofthe module M , 
and for each ofthese, there exists a prime element pi ofthe ring R such that pimi e 
Mi+\. 

Proof By the définition of a finitely generated module, there exists a finite number 

of generators m\ , . . . , m r e M such that the éléments a\m \ H h a r m r exhaust ail 

the éléments of the module Masai,...,a r run through ail éléments of the ring R. 

The collection of éléments of the form akftik H h a r m r , where a^, ... ,a r are ail 

possible éléments of the ring R , obviously forms a submodule of the module M. Let 
us dénoté it by M*. It is obvious that Mk D Mk+i and Mk — (nik, Mk+ 1). Without 
loss of generality, we may assume that ^ M^+i, since otherwise, the element 
can be excluded from among the generators. The constructed chain of submod- 
ules Mk is still not the chain of submodules M, that figures in Lemma 13.16. We 
obtain that chain from the chain of submodules Mk by putting several intermediate 
submodules between the modules Mk and Mk+ 1- 

Since mk G M is a torsion element, there exists an element a g R for which 
amk — 0 and in particular, amk G M*+i. Let d be an element of the ring R for 
which amk G Af*+ 1 and cp(a) assumes the smallest value among éléments with this 
property. If the element d is prime, then we set p\ — d , and then it is unnecessary to 
place a submodule between Mk and 1 . But if d is not prime, then let p\ be one 
of its prime divisors and d — p\b. Let us set m^i — bmk and = (m^ i, 1). 
Then clearly, /?i i g M&j and bmk g As we hâve seen, <p(b) < (pid) (strict 
inequality). Therefore, repeating this process a finite number of times, we will place 
a finite number of submodules (13.33) with the required properties between Mk and 
Mk+ 1. □ 

Remark 13.42 It is possible to show that the length of every chain of the form 
(13.33) satisfying the conditions of Lemma 13.16 is the same number n. Moreover, 
every chain of submodules 


M — Mo D M\ D M 2 D • • O M m 


in which M/ / Af/+i has length m <n, and this holds with much milder restrictions 
on the ring R and module M than we hâve assumed in this chapter. What is of 
essence here is only that between any two neighboring submodules M/ and M/+ 1, 
there does not exist an “intermediate” submodule M' different from M, and M/+ 1 
such that Mi D M[ D M/+ 1. 

For example, let us consider an w-dimensional vector space L over a field K as 
a module over the ring R = K.. Let «i, . . . , a, 7 be some basis. Then the subspaces 
L i = («/,...,«„), i = 1, . . . , n, hâve the indicated property. Using this, we could 
give a définition of the dimension of a vector space without appealing to the notion 
of linear dependence. Thus the length n of ail chains of the form (13.33) satisfying 
the conditions of Lemma 13.16 is the “correct” generalization of dimension of a 
space to finitely generated torsion modules. 
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The following lemma is analogous to the one we used in the proof of Theo- 
rems 5.12 and 13.22. 

Lemma 13.43 If the order of an element m of a module M is the power of a prime 
element, p n m — 0, and an element x ofthe cyclic submodule { m } is not divisible by 
p {that is , not representable in theform x — py , where y G M), then { m } = {x}. 

Proof It is obvious that {x} C {m}. Thus it remains to show that {m} C {x}, and 
for this, it suffices to ascertain that m g {x}. By assumption, x = am, where a is 
some element of the ring R. If a is divisible by p , then clearly, x is also divisible 
by p. Indeed, if a = pb with some b G R, then from the equality x = am, we obtain 
x = py, where y = bm, contradicting the assumption that x is not divisible by p. 

This means that a and p are relatively prime, and consequently, in view of the 
uniqueness of the décomposition into prime éléments of the ring R, a is also rela- 
tively prime to p n . Then on the basis of the Euclidean algorithm, we can find élé- 
ments u and v in R such that au + p n v — 1. Multiplying both sides of this equality 
by m, we obtain that m — ux, which means that m g {x}. □ 

Lemma 13.44 Let M \ be a submodule of the module M over a Euclidean ring 
R such that M — (m, M\) and M M \. Then if for some a, p G R, we hâve the 
inclusions am G M\ and pm G M\ , where the element p is prime, then a is divisible 
by p. 

Proof Let us assume that a is not divisible by p. Since the element p is prime, 
we hâve {a, p) — 1, and from the Euclidean algorithm in the ring R, it follows that 
there exist two éléments u, v g R for which au + pv — 1. Multiplying both sides 
of this equality by m, taking into account the inclusions am G M\ and pm G M\, 
we obtain that m e M i . By définition, ( m , M\) consists of éléments bm + m' for ail 
possible b g R and m' g M \ . Therefore, M — (m, M\) — M\, which contradicts the 
assumption of the lemma. □ 

Proof of Théo rem 13.39 The proof is an almost Verbatim répétition of the proof 
of Theorems 5.12 and 13.22. We may use induction on the length n of the chain 
(13.33), that is, we may assume the theorem to be true for the module M\. Let 

M[ — C\ © • • • © C r , (13.34) 

where C/ = {c/} are cyclic submodules, and the order of each element c; is the 
power of a prime element. By Lemma 13.16, M — {m, M\) and pm g M\, where p 
is a prime element. Then based on the décomposition (13.34), we hâve 

pm = z\-\ I -z r , Zi e Q. (13.35) 

We shall select those éléments Zi that are divisible by p. By a change in numération, 
we may assume that these are the first s — l terms. Let us set Zi — pz\ for i = 
1 , . . . , s — 1 . We must now consider two cases. 
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Case 1 : The number s — l is equal to r. Then pm — pm ' , where m' — z\ H Vz! r . 

Let us set m — m' — m. It is obvious that pm — 0. We shall prove that the module 
M can be written in the form 


M — {m} © Ci © • • • © C r . 

Indeed, by assumption, every element x e M can be represented in the form x = 
am + y , where a e R and y e M i, which means also in the form x = am + y', 
where y' — am' + y e M i . 

Let us prove that for two such représentations 

x=aïn + y, x = a'm + y r , (13.36) 

we hâve the equalities am = a'm and y = y' . From this it will follow that 


M = {m} © M\ = {m} © Ci © • • • © C r , 


which in our case, is relationship (13.31). 

We obtain from equalities (13.36) that am — y, where a = a — a\ ~ÿ — y' — y, 
and by assumption, y g M \ . By Lemma 13.16, there exists a prime element p of the 
ring R such that pm g M\, and this means that pm g Mi. By Lemma 13.20, from 
the inclusions am g M\ and pm g M \ , it follows that the element a is divisible 
by /?, that is, a — bp for some b e R. From this, we obviously obtain that am = 
b (pin) — 0. Consequently, am — a'm and y = y ' . 

Case 2: The number ^ — 1 is less than r. If an element c z has order p- 1 and pi is 
not an associate of p, then p. 1 is not divisible by p, and therefore, every element of 
the module C/ = {c/} is divisible p, by Lemma 13.17. Therefore, among the chosen 
s — l submodules C/ are ail those such that the order of the element c, is p { 1 , and pi 
is not an associate of p. Since the order of an element is in general defined only up 
to replacing it by an associate, we may consider that in the remaining submodules 
C 5 = {Cç}, . . . , C r = {Cf), the order of the element c z is a power of p. 

By construction, in the décomposition (13.35), we hâve Zi — pz! { , z\ G Ci , for ail 
i = 1 , . . . , s — 1 . Setting z\ H \~z' s _ \ = z ' and m — z! — m, we obtain the equality 

pm — z s -\ \-z r . (13.37) 

Since the order of the element c z for i = s , ... , r is a power of p, the order of an 
arbitrary element zi in the décomposition (13.37) is also a power of p. Let us dénoté 
it by p ni . Obviously, we may choose the numération of the terms in formula (13.37) 
in such a way that the numbers n z do not decrease: 1 < n s < n s+ \ < • • • < n r . Let us 
prove that the order of the element ni is equal to p Wr+1 and that we hâve the equality 


M — {m} © Ci © • • • © C s — i © • • • © Cf— i , 


that is, in the décomposition, ail submodules Cj occur other than C r . With this, 
relationship (13.31) will be proved in the second case as well; that is, the proof of 
Theorem 13.39 will be complété. 
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Multiplying both sides of equality (13.37) by p llr and using the fact that p n, 'Zi — 
0 for ail i = s, ... , r, we obtain that p nr+{ m — 0. If the order a of an element in 
is not an associate of /A +1 , then it divides it, and is equal, up to an associate, to 
p k for some k <n r - h 1. Multiplying relationship (13.37) by p k ~ [ and using the 
fact that the submodules Ci, . . . , C r form a direct sum, we obtain that p k ~ { Zi = 0 
for ail i = s, ... ,r. In particular, p k ~ [ z r = 0, and this contradicts the assumption 
k < n r + 1 and that the order of the element z r is equal to /A . Thus the order of the 
element in is equal to p ,lr+[ . 

Let us note that by construction, in the décomposition (13.37), the element z r is 
not divisible by p. 

From what we hâve proved, on the basis of Lemma 13.17, it follows that {z r } — 
{c r } = C r . From this it follows that every element m e M can be represented as a 
sum of éléments of the modules 

{ni}, Ci, ... , C s - 1, . . . , C r ~ i- (13.38) 

Indeed, an analogous assertion holds for the modules 

{ni}, Ci, ... , C s - 1, . . • , C r , (13.39) 

since by our construction, in — m — z! and z! — z\ + • • • + z' s _ v where z- G C/. 

Consequently, m =in + z\ H h z' _j , which means that every element m e M is 

a sum of éléments of the modules (13.39). 

We now must verify that every element of the submodule C, can be represented 
as a sum of éléments of the submodules (13.38). Since C r — {z r }, it suffices to verify 
this for a single element z r • But relationship (13.37) gives us precisely the required 
représentation: 

Zr — pm Z s ' Z.r—\ • 

It remains to verify the second condition entering into the définition of a direct sum: 
that such a représentation is unique. To this end, it suffices to prove that in the 
relationship 


am + f[ H h f s - 1 H h f r -\ — 


fi e Ci, 


(13.40) 


ail the terms must equal 0. 

Indeed, from relationship (13.40), taking into account (13.34), it follows that 
am G M\ . But by the construction of the element in, we then also hâve am g M\ . 
By Lemma 13.20, from the inclusions am g M\ and pm e M i, we hâve that the 
element a is divisible by p, that is, a — bp for some b e R. Furthermore, we know 
that 


pm — Zs H h Zr, 

and moreover, the order of the element z r is p Ur , while the order of the element in is 
/A + 1 . On substituting ail these relationships into décomposition (13.40), we obtain 

= 0 . 


b(z s H Z r ) + f 1 H 1" f s- 1 4 4 f r-\ 
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Then it follows from formula (13.34) that bz r — 0, and since the order of the element 
z r is equal to /A, we hâve that /A divides b. This means that the element a is 
divisible by /A +1 , and dm — 0 . But then from equality (13.40), it follows that 
fl + • • • + f r _ i = 0. Using again the induction hypothesis (13.34), we obtain that 
fl— 0 , . . . , f r _ i — 0 . This complétés the proof of Theorem 13.39. □ 

For Theorem 13.39, we hâve the same uniqueness theorem as in the case of 
Theorem 5.12 and Theorem 13.22. Namely, if 

M — Ci © • • • © C r , C/ = {ni/}, M — Di © • • • ® D s , D j — {n j) 

are two décompositions of finitely generated torsion modules M in which the orders 
of éléments mj and n j are prime powers, that is, p^mi — 0 and q - iij = 0, where 
Pi and qj are prime éléments, then with a suitable numération of the terms Ci and 
D j, éléments /?,- and qi are associâtes, and r,- = si . Flowever, a natural proof of this 
theorem would require some new concepts, and we shall not pursue this here. 


Chapter 14 

Eléments of Représentation Theory 


Représentation theory is one of the most “appliecT branches of algebra. It has many 
applications in various branches of mathematics and mathematical physics. In this 
chapter, we shall be concerned with the problem of finding ail finite-dimensional 
représentations of finite groups. But there is an analogous theory that has been devel- 
oped for certain types of infinité groups, which is important in many other branches 
of mathematics. 


14.1 Basic Concepts of Représentation Theory 

Let us recall some définitions from the previous chapter that will play a key rôle 
here. 

A homomorphism of a group G into a group G' is a mapping / : G G' such 
that for every pair of éléments gi , g 2 £ G, we hâve the relationship 

figigi) = f(g\)f(g2)- 

An isomorphism of a group G onto a group G' is a bijective homomorphism / : 
G — ► G' . Groups G and G' are said to be isomorphic if there exists an isomorphism 
f \ G —> G' between them. This is denoted by G — G' . 

Définition 14.1 A représentation of a group G is a homomorphism of G into the 
group of nonsingular linear transformations of a vector space L. The space L is called 
the space of the représentation or the représentation space , and its dimension, that 
is, dim L, is the dimension of the représentation. 


Thus in order to specify a représentation of a group G, it is necessary to associate 
with each element g g G a nonsingular linear transformation : L — > L such that 
for g\ , g 2 G G, the condition 


A?lg2 — ^ 81^82 


(14.1) 
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is satisfied. Silice the group of nonsingular linear transformations of an n- 
dimensional vector space is isomorphic to the group of nonsingular square matrices 
of order n, to give a représentation, it suffices to associate with each element g e G 
a nonsingular square matrix A g such that (14.1) is satisfied. 

It follows at once from (14.1) that for a représentation A g and any number of 
éléments gi , . . . , gg of the group G, we hâve the relationship 


'A'gi-gk ~ ^gi ‘ ‘ ‘ ^ gk- 

Moreover, it is obvious that if e is the identity element of G, then 


(14.2) 


A e = 8, (14.3) 

where 8 is the identity linear transformation of the space L. And if g -1 is the inverse 
of the element g, then 

= (14.4) 

that is, eAç-i is the transformation that is the inverse of A g . 


Example 14.2 Let G = GL„ be the group of nonsingular square matrices of order n . 
For each matrix g g GL„ , let us set 

— !<?!• 

Since |g| is a number, which by assumption is different from zéro, we hâve a one- 
dimensional représentation. It is obvious that for every integer n, the equality 

æ g = \ s\ n 

will also define a one-dimensional représentation. 


Example 14.3 Let G = S n be the symmetric group of degree n, that is, the group of 
permutations of an n -element set M, and let L be a vector space of dimension n , in 
which we hâve chosen a basis e \ , . . . , e n . For the représentation 

, i 1 2 - O- 

\J 1 12 Jn ) 

let us define A> g as the linear transformation such that 


(^ 1 ) — & j\ ■> -Ag (^2) — & j2 ’ • • • » A» g ( e n ) — € j n . 


Then we obtain an n-dimensional représentation of the group S n . 

To avoid having to use a spécifie numération of the éléments of the set M, let 
us associate with the element a e M, the basis vector e a . Then the représentation 
described above is given by the formula 


A g (e a ) = e b if g (a) = b, 


for every transformation g : M — >• M. 
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Example 14.4 Let G — S3 be the symmetric group of degree 3, and let L be a two- 
dimensional space with basis e\,e2- Let us define a vector e 3 by £3 = —(e\ + e 2)- 
For the représentation 

*-u i !)■ 

let us define A^ as the transformation such that 

^#(^1) = & j\ •> A g {e 2) = e j 2 • 

It is easily verified that in this way, we obtain a two-dimensional représentation of 
the symmetric group S 3 . 


Example 14.5 Let G = GL2 be the group of nonsingular matrices of order 2, and 
let L be the space of polynomials in the two variables x and y whose total degree in 
both variables does not exceed n . For a nonsingular matrix 




1 


let us define A g as the linear transformation of the space L taking polynomials 
f(x, y) to f(ax + by, ex + dy ), that is, 

<Â g (f(x, y)) = f(ax + by, ex + dy). 


It is easy to verify that relationship (14.1) is satisfied in this case, that is, we hâve 
a représentation of the group of nonsingular matrices of order 2. Its dimension is 
equal to the dimension of the space of polynomials in x and y whose dimension (in 
both variables combined) does not exceed n ; that is, as is easily seen, it is equal to 
(w + l)(/i + 2)/2. 


Example 14.6 For any group and an /7-dimensional space L, the représentation de- 
fined by the formula A g = 8 , where 8 is the identity transformation on the space L, 
is called the n-dimensional identity représentation. 

In the définition of a représentation, the space L can also be infinite-dimensional. 
In this case, the représentation is also said to be infinite-dimensional. For example, 
defining a représentation just as in Example 14.5, but taking for L the space of ail 
continuous functions, we obtain an infinite-dimensional représentation. In the se- 
quel, we shall consider only finite-dimensional représentations, and we shall always 
consider the space L to be complex. 


Example 14.7 Représentations of the symmetric group S n are of interest in many 
problems. Ail such représentations are known, but we shall describe here only the 
one-dimensional représentations of the group S n . In this case, a nonsingular linear 
transformation A g is given by a matrix of order 1, that is, a single complex number 
(which, of course, is nonzero). We thereby arrive at a function on the group taking 
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numeric values. Let us dénoté this function by (p(g). Then by définition, it must 
satisfy the conditions cp(g) 7 ^ 0 and 

<p(gh) = <p(g)<p(h) (14.5) 

for ail éléments g and h in the group S n . 

It is easy to find ail possible values cp( r) if r is a transposition. Namely, setting 
g — h — z and using the facts that z 2 — e (the identity transformation) and that 
obviously, cp(e) = 1, we obtain from relationship (14.5) the equality cp(z) 2 = 1, from 
which follows (p(z) = il. It is theoretically possible that for some transpositions, 
(p( r) = 1, while for others, <p(z) — — 1. However, in reality, such is not the case, and 
one of the equalities cp(z) = 1 and <p{z) — —1 holds for ail transpositions r, with 
the choice of sign depending only on the one-dimensional représentation cp. Let us 
prove this. 

Let r = x cl \j and z' — z CiC i be two transpositions, where a,b,c,d are éléments of 
the set M (see formula (13.3)). Obviously, there exists a permutation g of the set M 
such that g(c) — a and g (d) = b. Then as is easily verified, based on the définition 
of a transposition, we hâve the equality g~ { z a j ? g — z c j, that is, z' — g~ l zg. In 
view of relationships (14.2), (14.4), and (14.5), we obtain from the last equality that 

<p(* 0 = <p(g)~ V(rMg) = <p( r), 

which proves our assertion for ail transpositions z and z' . We shall now make use 
of the fact that every element g of the group S n is the product of a finite number 
of transpositions; see formula (13.4). Taking the aforesaid into account, it follows 
from this that 

(p(g ) = 'Kr ai ,fc 1 M*fl 2 ,& 2 ) • • ■ <p(r ak ,b k ) = <P( r) k , (14.6) 

where cp(z) = +1 or — 1 . 

Thus there are two possible cases. The first case is that for ail transpositions 
r e S n , the number cp(z) is equal to 1. In view of formula (14.6), for every transpo- 
sition g e S n , we hâve <p(g) = 1, that is, the function (p on S n is identically equal to 
1 , and therefore, it gives the one-dimensional identity représentation of the group S n . 
The second case is that for ail transpositions z e S n , we hâve cp(z) = — 1. Then, in 
view of formula (14.6), for a transposition g e S, 7 , we hâve cp(g) = (—l) k , where k 
corresponds to the parity of the transposition g. In other words, cp(g) = 1 if the trans- 
position g is even, and <p(g) = — 1 if the transposition g is odd. From relationship 
(13.4), it follows at once that such a function (p indeed détermines a one-dimensional 
représentation of the group S n , which we dénoté by s(g). 

Thus we hâve obtained the following resuit: the symmetric group S n has exactly 
two one-dimensional représentations: the identity and s(g). 

One-dimensional représentations of the group S n and related groups (such as the 
alternating group A n ) play a large rôle in a variety of questions in algebra. For ex- 
ample, one of the best-known results in algebra is the dérivation of formulas for 
the solution of équations of degrees 3 and 4. For a long time, mathematicians were 
thwarted in their attempts to find analogous formulas for équations of degree 5 and 
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higher. Finally, it was proved that such an attempt was futile, that is, that there exists 
no formula that expresses the roots of a polynomial équation of degree 5 or greater 
in terms of its coefficients using the usual arithmetic operations and the extraction 
of roots of arbitrary degree. A key point in the proof of this assertion was the estab- 
lishment of the fact that the alternating group A n for n > 5 has no one-dimensional 
représentation other than the identity. For n = 3 and 4, such représentations of the 
group A n exist, and that is what explains the existence of formulas for the solution 
of équations of those degrees. 

Now let us establish what représentations we shall consider to be identical. 

Définition 14.8 Two représentations g i— >- A g and g i-> A' g of the same group G 
with spaces L and L of the same dimension are said to be équivalent if there exists 
an isomorphism C : L' — ► L of the vector spaces L' and L such that 

A' g = e- l A g e (14.7) 

for every element g g G. 

Let e' { , ... , e' n be a basis of the space L' and let e\ = . . . , e n = C(e' n ) be 

the corresponding basis of the space L, since the linear transformation C : L' L 
is an isomorphism. Comparing relationship (14.7) with the change-of-matrix for- 
mula (3.43), we see that this définition means that the matrix of the transformation 
A f „ with basis e \ , . . . , e' n coincides with the matrix of the transformation A g with 
basis ei, ... , e n . Thus the représentations A g and A' g are équivalent if and only if 
one can choose bases in the spaces L and L' such that for each element g c G, the 
transformations A g : L — ► L and A' g : L' — ► L' hâve identical matrices. 

Let g i— >- A g be a représentation of the group G, and let L be its représentation 
space. A subspace M C L is said to be invariant with respect to the représentation A g 
if it is invariant with respect to ail linear transformations A g : L L for ail g g G. 
Let us dénoté by 33 g the restriction of A g to the subspace M. It is obvious that 33 g 
is a représentation of the group G with représentation space M. The représentation 
33 g is said to be the représentation induced by the représentation A g with invariant 
subspace M. This is also expressed by saying that the représentation 33 g is contained 
in the représentation A g . 

Example 14.9 Let us consider the n-dimensional représentation of the group S n 
described in Example 14.3. As is easily verified, the collection of ail vectors of the 
form X ] a eM a a e a , where a a is an arbitrary scalar satisfying ^ 2 ae M a a — 0» forms 
a subspace L C L of dimension n — 1 , invariant with respect to this représentation. 
The représentation thus induced in L is an ( n — l)-dimensional représentation of 
the group S n . In the case n — 3, it is équivalent to the représentation of the group S 3 
described in Example 14.4. 

Example 14.10 In Example 14.5, let us dénoté by (k = 0, . . . , n) the subspace 
consisting of polynomials of degree at most k in the variables x and y. Each of 
is an invariant subspace of every M/ with index l > k. 
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Définition 14.11 A représentation is said to be reducible if its représentation space 
L has an invariant subspace different from (0) and from ail of L. Otherwise, it is said 
to be irreducible. 

Examples 14.3 and 14.5 give reducible représentations. Clearly, the n-dimen- 
sional identity représentation is reducible if n > 1 : every subspace of the représen- 
tation space is invariant. Every one-dimensional représentation is irreducible. 

Let us prove that the représentation in Example 14.4 is irreducible. Indeed, any 
invariant subspace different from (0) and L must be one-dimensional. Let « be a 
basis vector of such a subspace. The condition of invariance means that 

A g (u) = X g u 

for every g e S 3 , where X g is some scalar depending on the element g, that is, u 
is a common eigenvector for ail transformations A g . It is easy to verify that this is 
impossible: the eigenvectors of the transformation A gl with g\ = ^ 3 ) hâve the 

form a(e\ + ^ 2 ) and j5(e\ — ^ 2 ), and the eigenvectors of the transformation A g2 with 
g 2 = ( 3 \ j ) hâve the form ye 2 and S( 2e 1 + ^ 2 )» and these clearly cannot coincide. 

Définition 14.12 A représentation A g is said to be the direct sum of the r représen- 
tations 

4 Ü) A (r > 

e/V g , . . . , e/A) ç 

if its représentation space L is the direct sum of the r invariant subspaces 

L = l_i ® • • • ® L,-, (14.8) 

and A g induces in every L / a représentation équivalent to A g \ i = 1, . . . , r. 

Example 14.13 The /î-dimensional identity représentation is the direct sum of n 
one-dimensional identity représentations. To convince oneself of this, it suffices to 
décomposé the space of this représentation in some way into a direct sum of one- 
dimensional subspaces. 

Example 14.14 In the situation of Example 14.9, let us dénoté by l_i an invariant 
subspace L' of dimension n — 1 , and let us dénoté by L 2 the one-dimensional sub- 
space spanned by the vector YlaeM e a- Clearly, L 2 is also an invariant subspace 
of this représentation, and we hâve the décomposition L = Li ® L 2 . In particular, 
the représentation introduced in Example 14.3, for n — 3, is the direct sum of the 
représentation of Example 14.4 and the one-dimensional identity représentation. 

It can happen that the représentation space L has an invariant subspace Li, y et it 
is impossible to find a complementary invariant subspace L 2 such that L = l_i ® L 2 . 
In other words, the représentation is reducible, but it is not the direct sum of two 
other représentations. 

Example 14.15 Let G — {g} be an infinité cyclic group, and let L be a two- 
dimensional space with basis ^ 1 ,^ 2 - Let us dénoté by A n the transformation having 
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in this basis the matrix It is obvious that A n A m — A n + m . From this, it fol- 
lows that on setting A g n = A n , we obtain a représentation of the group G. The line 
Lj = ( 02 ) is an invariant subspace: A n (e 2 ) = £ 2 - However, there are no other invari- 
ant subspaces. Thus, for instance, the transformation A \ has no eigenvectors other 
than ^ 2 - Therefore, our représentation is reducible, but it is not a direct sum. 

Let us note that in Example 14.15, the group G was infinité. It turns out that for 
finite groups, such a phenomenon cannot occur. Namely, in the following section, 
it will be proved that if a représentation A g of a finite group is reducible, that is, 
the vector space L of this représentation contains an invariant subspace Lj, then 
L is the direct sum of Li and another invariant subspace L 2 . Hence it follows that 
every représentation of a finite group is the direct sum of irreducible représentations. 
As regards irreducible représentations, it will be proved in Sect. 14.3 that (up to 
équivalence) there is only of finite number of them. 

From this point on, to the end of this book, we shall always assume that a group 
G is finite, with the sole exception of Example 14.36. 


14.2 Représentations of Finite Groups 

The proof of the fundamental property of représentations of finite groups formulated 
at the end of the preceding section uses several properties of complex vector spaces. 

Let us consider a représentation of a finite group G. Let L be its représentation 
space. Let us define on L some Hermitian form <p(x, y) for which the correspond- 
ing quadratic-Hermitian form \jr(x) — <p(x, x) is positive definite, and thus it takes 
positive values for ail x ^ 0. For example, if L = C", then for vectors x and y with 
coordinates (x \ , . . . , x n ) and (yi , . . . , y n ) , let us set 

n 

<p(x,y) = '%2 x iÿi- 

i = 1 

In the sequel, we shall dénoté <p(x, y) by ( x , y) and call it a scalar product in the 
space L. The concepts and simple results that we proved in Chap. 7 for Euclidean 
spaces can be transferred to this case Verbatim. Let us list those of them that we are 
now going to use: 

1. The orthogonal complément of a subspace L' C L is the collection of ail vec- 
tors y g L for which (x, y) = 0 for ail x g L. The orthogonal complément of 
a subspace L is itself a subspace of L and is denoted by (L) -1 . We hâve the 
décomposition L = L 0 (L/)- 1 . 

2. A unitary transformation (the analogue of orthogonal transformation for the case 
of a complex space) is a linear transformation VL : L ^ L such that for ail vectors 
x , y g L, we hâve the relationship 

(U(x), U(y)) = (x, y). 
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3. The complex analogue of Theorem 7.24 is this: if a subspace L' c L is invariant 
with respect to a unitary transformation VL, then its orthogonal complément (L')" 1 
is also invariant with respect to VL. 

Définition 14.16 A représentation VL g of a group G is said to be unitarizable if it 
is possible to introduce a scalar product on its représentation space L such that ail 
transformations VL g become unitary. 

The property of a représentation being unitarizable obviously remains true under 
a change to an équivalent représentation. 

Indeed, let g i-> VL g be a unitarizable représentation of some group G with space 

L and Hermitian form cp(x, y). Let us consider an arbitrary isomorphism (3 : L r — ^ I 

As we know, it détermines an équivalent représentation g \-> VL' of the same group 

o 

with space L'. Let us show that the représentation g i— >- VL' is also unitarizable. As 

<5 

the scalar product in L let us choose the form defined by the relationship 

f{u,v) = (p(e(u),e{v)) (14.9) 

for vectors u, v e L. It is obvious that f{u, v) is a Hermitian form on L and that 
x jr(u,u ) > 0 for every nonnull vector u e L' . Let us verify that the scalar product 
f(u,v) indeed establishes the unitarizability of the représentation g i-> VL' . Substi- 

o 

tuting the vectors VL' Au) and V Av) into equality (14.9), taking into account (14.7) 

o o 

and the unitarizability of the représentation g i-> VL g , we obtain the relationship 

f(u' g (u), W g { v)) = ^(c~ l u g c(u), e~ l u g e( »>) 

= (p(u g e{u), u g c(v)) = <p(c(u), e(v)) = »), 

which means that the représentation g i-> VL' is unitarizable. 

<5 

Lemma 14.17 If a space L of a unitarizable représentation VL g of a group G con- 
tains an invariant subspace L' , then it also contains a second invariant subspace L" 
such that L=L , ©L // . 

P roof Let us take as L" the orthogonal complément (L)^. Then the space L" is 
invariant with respect to ail transformations VL g , and we hâve the décomposition 
L = L' ® L" . 1 □ 

The application of this lemma to représentations of finite groups is based on the 
following fundamental fact. 

Theorem 14.18 Every représentation A g of a finite group G is unitarizable. 

P roof Let us introduce a scalar product on the représentation space L in such a way 
that ail linear transformations A g become unitary. For this, let us take an arbitrary 
scalar product [x, y] in the space L, defined by an arbitrary Hermitian form <p(x, y), 
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such that the associated quadratic form <p(x,x) is positive definite: <p(x,x) > 0 for 
every Let us now set 

(x, y) = ^[<A t? (x), A g (y)], (14.10) 

where the sum is taken over ail éléments g of the group G. We shall prove that 
(x, y) is also a scalar product and that with respect to it, ail transformations A g are 
unitary. 

The required properties of a scalar product for (x, y) dérivé from the analogous 
properties of [x, y] and from the fact that A g is a linear transformation: 

1. ( y , x) = ^[«AgOO, A g (x)] = Y A g(y)] = (*> 30, 

geG geG 

2. (Àx, y) = £[*,(Xx), A g (y)] = Y *-[A g (x), A g (y)] = À(x, y), 

g^G geG 

3. (x i + x 2 ,y) = ^[cA ? (xi H- x 2 ), A g (y)] 

geG 

= Y I A?(*i) + Ag(x 2 ), <A?O0] = (x\,y) + (X 2 , y), 

gzG 

4. (x, x) = £[*,(x), eAg(x)] >0, if x 7^ 0. 

geG 

For the proof of the last property, it is necessary to observe that in this sum, ail 
terms [^(x), <A ? (x)] are positive. This follows from the analogous property of the 
scalar product [x, y], that is, from the fact that [x, x] >0 for ail x^0. Since the 
linear transformation A g : L —> L is nonsingular, it takes every nonnull vector x to a 
nonnull vector c4> ? (x). 

Let us now verify that with respect to the scalar product (x, y), every transfor- 
mation A] u h g G, is unitary. In view of (14.10), we hâve 

{Ah(x), Ah (y)) — ^ v (Ah (x)) , eA# (Ah (y)) ] 

=n* g A h (x ) , A g Ah (y)]- (14.11) 

geG 

Let us set gh = u. In view of property (14.1), we hâve A g Ah — A g h — A u . There- 
fore, we may rewrite equality (14.1 1) in the form 

(4W, AOO) = Y [A.0O, AOO]- (14.12) 

u=gh 

Let us now observe that as g runs through ail éléments of the group G while h 
is fixed, the element u = g h also runs through ail éléments of the group G. This 
follows from the fact that for every element u e G, the element g — uh~ { satisfies 
the relationship g h — u, and that for distinct gi and g 2 , we thereby obtain distinct 
éléments u\ and u 2 . 
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Thus in equality (14.12), the element u runs through the entire group G, and we 
can rewrite this equality in the form 

(^(x), A h (y)) = J^[<A i? (x), A g (y)], 

geG 

whence in view of définition (14.10), it follows that ( Ah(x ), Ah (y)) = (x, y), that 
is, the transformation Ah is unitary with respect to the scalar product (x, y). □ 

Corollary 14.19 If the space L of a représentation of a finite group contains an 
invariant sub space L, then it contains another invariant subspace L " such that L = 
L' © L". 


This follows directly from Lemma 14.17 and from Theorem 14.18. 

Corollary 14.20 Every représentation of a finite group is a direct sum of irreducible 
représentations. 


Proof If the space L of our représentation Ag does not hâve an invariant subspace 
different from (0) and ail of L, then this représentation itself is irreducible, and our 
assertion is true (although trivially so). But if the space L has an invariant subspace 
L', then by Corollary 14.19, there exists an invariant subspace L" such that L = 
L' © L". 


Let us apply the same argument to each of the spaces L' and L". Continuing this 
process, we will eventually corne to a hait, since the dimensions of the obtained 
subspaces are continually decreasing. As a resuit, we arrive at such a décomposi- 
tion (14.8) with some number r >2 such that the invariant subspaces L / contain 
no invariant subspaces other than (0) and ail of L, . This means precisely that the 
représentations eA^, . . . , A g J induced in the subspaces l_i, . . . , L r by our représen- 
tation Ag are irreducible, and the représentation A g décomposés as a direct sum 


A 


(i) 

g 


A 


(r) 

8 


□ 


Theorem 14.21 If a représentation A g décomposés into a direct sum of irreducible 
représentations «A^, . . . , A g \ then every irreducible représentation 33 g contained 
in A g is équivalent to one ofthe A^\ 


Proof Let L = Li © • • • © L r be a décomposition of the space L of the représen- 
tation Ag into a direct sum of invariant subspaces such that A g induces in L / the 

représentation A g ^ , and let M be the invariant subspace L in which A g induces the 
représentation 33 g . 

Then in particular, for every vector x g M, we hâve the décomposition 

x=xi4 fx r , x/gL/. (14.13) 

It détermines a linear transformation : M — > L ; that is the projection of the sub- 

space M onto L i parallel to l_i © • • • © L/_i © L/+i © • • • © L r ; see Example 3.51 on 
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p. 103. In other words, the transformations Pi : M L, are defined by the condi- 
tions 

3 > i(x)=Xi, / = 1 , . . . , r . (14.14) 

The proof of the theorem is based on the relationships 

A g Pi(x) = PiA g (x), / = l,...,r, (14.15) 

which are valid for every vector x g M. For the proof of relationships (14.15), let us 
apply the transformation A g to both sides of equality (14.13). We then obtain 

A g (x) = A g (x\) H h <Ag(x r ). (14.16) 

Since A g (x) g M and A g (Xi) G L;, i — 1, . . . , r, it follows that relationship (14.16) 
is décomposition (14.13) for the vector eAg(x), whence follows equality (14.15). 

From the irreducibility of the représentations A g l \ . . . , A g ) and P g , it follows 
that the projection Pi defined by formula (14.14) is either identically zéro or an 
isomorphism of the spaces M and L / . Indeed, let the vector x g M be contained in 
the kernel of the transformation Pi, that is, Pi (x) = 0. Then clearly, A g Pj( x) = 
0, and in view of relationship (14.15), we obtain that PjA g (x) — 0, that is, the 
vector eAç(x) is also contained in the kernel of Pi. From the irreducibility of the 

représentations Ag \ it now follows that the kernel either is equal to (0) or coincides 
with the entire space M (in the latter case, the projection Pi will obviously be the null 
transformation). In exactly the same way, from equality (14.15), it follows that the 
image of the transformation Pi either equals (0) or coincides with the subspace L/ . 

However, there is certainly at least one such index i among the numbers 1 , . . . , r 
for which the transformation Pi is not identically zéro. For this, we must take an 
arbitrary nonnull vector x g M one of whose components x/ in the décomposition 
(14.13) is not equal to zéro, and therefore, Pi (x) ^ 0. Taking into account the pre- 
vious arguments, this shows that the corresponding transformation Pi is an isomor- 
phism of the vector spaces M and L/, and relationship (14.15) shows the équivalence 
of the corresponding représentations 33 g and A g \ □ 

Corollary 14.22 In a given représentation are contained only finitely many 
distinct — in the sense of équivalence — irreducible représentations. 

Indeed, ail irreducible représentations contained in the given one are équivalent 
to one of those encountered in an arbitrary décomposition of this représentation as 
a direct sum of irreducible représentations. 

Remark 14.23 From Theorem 14.21 there follows a certain property of uniqueness 
of the décompositions of a représentation into irreducible représentations. Namely, 
however we décomposé a représentation, we shall encounter in the décomposition 
the same (up to équivalence) irreducible représentations. Indeed, let us select a cer- 
tain décomposition of our représentation into irreducible représentations. An irre- 
ducible représentation encountered in any other décomposition appears in our rep- 
résentation, which means that by Theorem 14.21, it is équivalent to one of the terms 
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of the chosen décomposition. A stronger property of uniqueness consists in the fact 
that if in one décomposition there appear k terms équivalent to a given irreducible 
représentation, then the same number of such terms will appear as well in every 
other décomposition. We shall not require this assertion in the sequel, and we shall 
therefore not prove it. 


14.3 Irreducible Représentations 

In this section, we shall prove that a finite group has only a finite number of distinct 
(up to équivalence) irreducible représentations. We shall accomplish this as follows: 
We shall construct one particularly important représentation called a regular rep- 
résentation , for which we then shall prove that every irreducible représentation is 
contained within it. The finiteness of the number of such représentations will then 
resuit from Corollary 14.22. The space of a regular représentation consists of ail 
possible functions on the group. This is a spécial case of the general notion of the 
space of functions on an arbitrary set (see Example 3.36, p. 94). 

For an arbitrary finite group G, let us consider the vector space M(G) of functions 
on this group. Since the group G is finite, the space M (G) has finite dimension: 
dimM(G) = |G|. 

Définition 14.24 The regular représentation of a group G is the représentation tR g 
whose représentation space is the space M (G) of functions on the group G, and in 
which the element g g G is associated with the linear transformation lR g that takes 
the function f{h) G M (G) to the function (p(h) = f(hg): 

{n g (f))(h) = f(hg). (14.17) 

Formula (14.17) means that the resuit of applying the linear transformation fR g 
to the function / is a “translated” function /, in the sense that the value lR g (f) on 
the element h g G is equal to f(hg). We shall omit the obvious vérification of the 
fact that the transformation of the space M (G) thus obtained is linear. Fet us verify 
that !R g is a représentation, that is, that it satisfies the requirements (14.1). 

Let us set Rg lg2 (f) — <p- By formula (14.17), we hâve 

<p(h) = f(hgig 2 ). 

Let Jl g2 (f) = xfr. Then 

t(u) = f(ug2)- 

Finally, if 3l g{ 3l gl (f) — (p\, then (p\ = Jl gl (x//) and (p\{u) — xj/{ug\). Substituting 
u = hg\ into the previous formula, we obtain that <pi(u) = x//(ug\) = f(ug\g 2 ) for 
every element u g G. This means that tp = <p\ and fà glg2 = Rg\ ^g 2 - 

Example 14.25 Let G be a group of order two, consisting of éléments e and g, 
where g 2 = e. A particular instance of this group is 52, the symmetric group of 
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degree 2. The space M (G) is two-dimensional, and every function / g M (G) is 
defined by two numbers, a = f(e) and P — /(g), that is, it can be identified with 
the vector (a, P). As with any représentation, 3l e is the identity transformation. Let 
us détermine what 31 g is. By formula (14.17), we hâve 

(**(/))(«) = fis ) = P, (**(/))(*) = /(g 2 ) = /(«) = «. 

This means that the linear transformation takes the vector (a, P) to the vector 
(P, a), that is, it represents a reflection with respect to the line a = P . 

Theorem 14.26 Every irreducible représentation of a finite group G is contained 
in its regular représentation 31 g . 

Proof Let A g be an irreducible représentation with space L. Let us dénoté by / an 
arbitrary nonnull linear function on the space L and let us associate with each vector 
x e L the function f(h) — l(Ah(x)) g M (G) obtained when the vector x is fixed 
and the element h runs through ail possible values of the group G. It is obvious that 
in this way, we obtain a linear transformation G : L -> M' defined by the relationship 

C(x) = l(A h (*)), (14.18) 

where IVf is some subspace of the vector space M(G). Here by construction, C(L) = 
M', that is, M r is the image of the transformation C. 

We shall prove the following properties: 

(1) For ail éléments g G G and vectors x g L, we hâve the relationship 

(CA g )(x) = (3t g C)(x). (14.19) 

(2) The subspace M' is invariant with respect to the représentation 3l g . 

(3) The transformation C is an isomorphism of the spaces L and M'. 

Comparing formulas (14.19) and (14.7), taking into account the remaining two 
properties, we conclude that the irreducible représentation A g is équivalent to the 
représentation induced by the regular représentation 3l g in the invariant subspace 
M 7 c M(G). By virtue of the définitions given above, this means that A g is contained 
in 31 g , as asserted in the statement of the theorem. 

Proof of property (1). Let us set C(x) = / G M (G). Then by définition, f(h) — 
l(Ah(x)) for every element h e G. Applying formula (14.17), we obtain the rela- 
tionship 

W g e)(x) = 3l 8 (f) = <p, (14.20) 

where cp is the function on the group G defined by the relationship <p(h) = 
1 (Ah g (x)). 

On the other hand, substituting the vector eA^(x) for x in formula (14.18), we 
obtain the equality 


e(<A ? (x)) = (GA g )(x) = (pi(h), 


(14.21) 
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where the function <p\(h) is defined by the relationship 

<P\(h) — l(Ah<A g (x)) —l(Ah g (x)), 

and clearly, it coincides with (p(h). Taking into account that cp(h ) = (p\(h), we see 
that equalities (14.20) and (14.21) yield that (Ge>4>g)(x) = ( f P g C)(x). 

P roof of property (2). We must prove that for every element g G G, the image of the 
linear transformation tR g (N\') is contained in M' . Let / g M', that is, by the définition 
of the image, f = C(x) for some x e L. Then taking into account formula (14.19) 
proved above, we hâve the equality 

ftg(f) = (Æ g C)(x) = (GAg)(x) = C(y), 

where the vector y = A g (x) is in L, and by our construction, this means that 
tR g (f) G M'. This proves the required inclusion lR g (M') C M'. 

P roof of property (3). Since by construction, the space IVf is the image of the trans- 
formation C : L —> M', it remains only to show that the transformation C is bijective, 
that is, that its kernel is equal to ( 0 ). This means that we must prove that the equality 
x = 0 follows from the equality G(x) = O 7 (where O 7 dénotés the function identically 
equal to zéro on the group G). Let us dénoté the kernel of the transformation C by 
L. As we know, it is a subspace of L. Let us show that L is invariant with respect to 
the représentation A g . 

Indeed, let us suppose that G(x) = O 7 for some vector x g L, and let us set 
y — A g (x). On applying the transformation C to the vector y, taking into account 
formula (14.19), we obtain 

G (y) = (CAg(x)) = G fR g e)(x) = Rg(e(x)) = Rg(tf) = 0'. 

But from the irreducibility of the représentation A g , it now follows that either L = L 
or L' = ( 0 ). The former would mean that l(Ah(x)) — 0 for ail h e G and x G L. But 
then even for h = e , we would hâve the equality l(A e (x)) — l(8(x)) = / (x) = 0 for 
ail x g L, which is impossible, since in the définition of the transformation G, the 
function / was chosen to be not identically zéro. This means that the subspace L is 
equal to ( 0 ), which is what was to be proved. □ 

Corollary 14.27 A finite group has only ci finit e number of distinct ( up to équiva- 
lence) irreducible représentations. 

Example 14.28 Let A g be the one-dimensional identity représentation of the 
group G. Then the space L is one-dimensional. Let e be a basis of L. Let us de- 
fine the function / by the condition l(ae) = a. Formula (14.18) gives for the vector 
x = ae, the value 

C(ae) = /, where f(h) — l(Ah(ae)) = l(ae) = a. 

Thus to the vector ae is associated the function /, which takes for ail h g G the 
same value a . Obviously, such constant functions indeed form an invariant subspace 
with respect to the regular représentation, and the représentation induced in it is the 
identity, as asserted by Theorem 14.26. 
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14.4 Représentations of Abelian Groups 

Let us first of ail recall that we are assuming throughout that the space L of a repré- 
sentation is complex. 

Theorem 14.29 An irreducible représentation of an abelian group is one-dimen- 
sional. 

P roof Let g be a fixed element of the group G. Its associated linear transformation 
A H : L — ► L has at least one eigenvalue À. Let M c L be the eigensubspace corre- 
sponding to the eigenvalue À, that is, the collection of ail vectors x G L such that 

e A^(x) = Àx. (14.22) 

By construction, M^(0). We shall now prove that M is an invariant subspace of our 
représentation. It will then follow from the irreducibility of the représentation that 
M = L, and then equality (14.22) will hold for every vector x G L. In other words, 
A g = X8, and the matrix of the transformation A g is equal to XE. A matrix of this 
type is called a scalar matrix. This reasoning holds for every g G G ; we hâve only 
to note that the eigenvalue À in formula (14.22) dépends on the element g, and the 
remainder of the argument does not dépend on it. Thus we may conclude that the 
matrices of ail transformations A g are scalar matrices, and if dimL > 1, then every 
subspace of the space L is invariant. Consequently, if a représentation is irreducible, 
it is one-dimensional. 

It remains to prove the invariance of the subspace M. It is here that we shall 
specifically use the commutativity of the group G. Let x g M, h g G. We shall 
prove that A/fx ) g M. Indeed, if Ah(x) = y, then 

Ag(y) = «^(^(x)) = Agh(x) — Ai lg (x) = Ah^Agix)) — Ah(kx) 

= XAh(x ) = Xy, 

that is, the vector y belongs to M. □ 

In view of Theorem 14.29, every irreducible représentation of an abelian group 
can be represented in the form A g = x(g), where x(g) is a number. Condition 
(14.1) can then be written in the following form: 

X(gig2) — X(gi)x(g2)- (14.23) 

Définition 14.30 A function x(g) on an abelian group G taking complex values 
and satisfying relationship (14.23) is called a character. 

By Theorem 14.29, every irreducible représentation of a finite abelian group is 
a character x(g). On the other hand, it follows from Theorem 14.26 that this rep- 
résentation is contained in the régulai* représentation. In other words, in the space 
M (G) of functions on the group G, there exists an invariant subspace M' in which 
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the regular représentation induces a représentation équivalent to ours. Silice our rep- 
résentation is one-dimensional, the subspace IVf is also one-dimensional. Let some 
function / g M (G) be a basis in IVf. Then since the représentation induced by the 
regular représentation in IVf has matrix / (g), and tR g (f)(h) — f(hg ), we must hâve 
the relationship 

f(hg) = x(g)f(h). 

Let us set h — e in this equality and let us also set f(e) — a. We obtain that f(g) = 
a x(g), that is, we may take as a basis of the subspace IVf the character x itself 
(indeed, it is a function on G, and this means that x € M (G)). As we hâve seen, 
we then hâve M (G) = IVf ® M", where M" is also an invariant subspace. Applying 
analogous arguments to M" and to ail invariant subspaces of dimension greater than 
1 that we obtain along the way, we finally arrive at a décomposition of the subspace 
M (G) as a direct sum of one-dimensional invariant subspaces. We hâve thereby 
proved the following resuit. 

Theorem 14.31 The space M (G) of functions on a finit e abelian group G can be 
decomposed as a direct sum of one-dimensional subspaces that are invariant xvith 
respect to the regular représentation. In each such subspace , one can take as a basis 
vector some character x(g). Then the matrix of the représentation that is induced 
in this subspace coincides with this same character x (g)- 

It is obvious that we thereby establish a bijective relationship between the char- 
acters of the group G and one-dimensional invariant subspaces of the space M (G) 
of functions on this group. Indeed, two distinct characters x\ and X 2 cannot be basis 
vectors of one and the same représentation: that would mean that 

Xi(g) = °‘X 2 (g) for ail g e G. 

Setting here g — e, we obtain a = 1, since x\ and X 2 are homomorphisms of the 
group G into C, and therefore, xi ( e ) — X2(e) — 1 . 

Since by Corollary 14.19, a regular représentation can be decomposed into a 
direct sum of irreducible représentations, we obtain the following results for every 
finite abelian group G. 

Corollary 14.32 The characters form a basis ofthe space M (G) of functions on the 
group G. 

This assertion can be reformulated as follows. 

Corollary 14.33 The number of distinct characters of a group G is equal to its 
order. 

This follows from Corollary 14.32 and the fact that the dimension of the space 
M (G) is equal to the order of the group G. 
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Corollary 14.34 Every function on the group G is a linear combination ofcharac- 
ters. 

Example 14.35 Let G = {g} be a cyclic group of finite order n, g n — e. Let us 
dénoté by • • • , Hn- 1 the distinct nth roots of 1, and let us set 

Xi(g k )=Hl fc = 0, 1, . . . , n — 1. 

It is easily verified that Xi is a character of the group G and that the characters Xi 
corresponding to £/, the distinct nth roots of 1, are themselves distinct. Since their 
number is equal to |G|, they must be ail the characters of the group G. By Corol- 
lary 14.32, they form a basis of the space M(G). In other words, in an n-dimensional 
space, the vectors 1 f” -1 corresponding to the nth roots of 1 form a basis. 

This can also be verified directly by calculating the déterminant consisting of the 
coordinates of these vectors as a Vandermonde déterminant (p. 41). 

Example 14.36 Let us dénoté by S the group of rotations of the circle in the plane. 
The éléments of the group S correspond to points of the circle: if we associate with 
a real number (p the point of the circle with argument then with any one point 
of the circle will be associated numbers that differ from one another by an integer 
multiple of 2n. Therefore, this group S is frequently called the circle group. 

After choosing a certain integer m, let us associate with the point t of the circle S 
having argument (p the number cos m(p + i sinmcp, where i is the imaginary unit. It 
is obvious that adding an integer multiple of 2n to (p does not change this number, 
which means that it is uniquely defined by the point t e S. Let us set 

Xm(t) = cosm(p + i sinmcp, m = 0, dbl, ±2, (14.24) 

It is not difficult to verify that the function Xm(f) is a character of the group S. For 
an infinité group such as S, it is natural to introduce into the définition of a character 
in addition to the requirement (14.23), the requirement that the function Xm ( 0 be 
continuous. The reason for such a requirement for the group S is as follows: it 
is necessary that the real and complex parts of the functions Xm( 0 be continuous 
functions. 

It is possible to prove that the characters Xmil) defined by formula (14.24) are 
continuous and that they comprise ail the continuous characters of the circle. This 
explains to a large degree the rôle of the trigonométrie functions cos mep and sin mep 
in mathematics: they are the real and imaginary parts of the continuous characters 
of the circle. 

Corollary 14.34 asserts that every function on a finite abelian group can be rep- 
resented as a linear combination of characters. In the case of an infinité group such 
as S , some analytic restrictions, which we shall not specify here, are naturally im- 
posed on such a function. We shall only mention the significance of functions on 
the group S. Such a function f(t) can be represented as a function F(cp) of the 
argument (p of the point t g S. It must not, however, dépend on the choice of the ar- 
gument g) of the point t, that is, it must not change on the addition to g) of an integer 
multiple of 2n . In other words, F(cp) must be a periodic function with period 2n . 
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The analogue of Corollary 14.34 for the group S asserts that such a function can be 
represented as a linear combination (in the given case, infinité) of functions Xm(<p), 

m = 0, ±1, ±2, In other words, this is a theorem about the fact that a periodic 

function (with certain analytic restrictions) can be decomposed into a Fourier sériés. 


Historical Note 


Here we shall présent a brief chronology of the appearance of the concepts discussed 
in this book. The development of mathematical ideas generally proceeds in such a 
way that some concepts gradually emerge from others. Therefore, it is generally 
impossible to fix accurately the appearance of some particular idea. We shall only 
point out the important milestones and, it goes without saying, shall do so only 
roughly. In particular, we shall limit our view to Western European mathematics. 

The principal stimulus was, of course, the création of analytic geometry by Fer- 
mât and Descartes in the seventeenth century. This made it possible to specify points 
(on the line, in the plane, and in three-dimensional space) using numbers (one, two, 
or three), to specify curves and surfaces by équations, and to classify them accord- 
ing to the algebraic nature of their équations. In this regard, linear transformations 
were used frequently, especially by Euler, in the eighteenth century. 

Déterminants (particularly as a symbolic apparatus for finding solutions of Sys- 
tems of n linear équations in n unknowns) were considered by Leibniz in the sev- 
enteenth century (even if only in a private letter) and in detail by Gabriel Cramer 
in the eighteenth. It is of interest that they were constructed on the basis of the rule 
of “general expansion” of the déterminant, that is, on the basis of the most complex 
(among those that we considered in Chap. 2) way of defining them. This définition 
was discovered “empirically,” that is, conjectured on the basis of the formulas for 
the solution of Systems of linear équations in two and three unknowns. The broadest 
use of déterminants occurred in the nineteenth century, especially in the work of 
Cauchy and Jacobi. 

The concept of “multidimensionality,” that is, the passage from one, two, and 
three coordinates to an arbitrary number, was stimulated by the development of 
mechanics, where one considered Systems with an arbitrary number of degrees of 
freedom. The idea of extending géométrie intuition and concepts to this case was 
developed systematically by Cayley and Grassmann in the nineteenth century. At 
the same time, it became clear that one must study quadrics in spaces of arbitrary 
dimension (Jacobi and Sylvester in the nineteenth century). In fact, this question had 
already been considered by Euler. 
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Historical Note 


The study of concepts defined by a set of abstract axioms (groups, rings, algebras, 
fields) began as early as the nineteenth century in the work of Hamilton and Cayley, 
but it reached its full flowering in the twentieth century, chiefly in the schools of 
Emmy Noether and Emil Artin. 

The concept of a projective space was first investigated by Desargues and Pascal 
in the seventeenth century, but systematic work in this direction began only in the 
nineteenth century, beginning with the work of Poncelet. 

The axiomatic définition of vector spaces and Euclidean spaces as given in this 
book broke finally with the primacy of coordinates. It was first rigorously formulated 
almost simultaneously by Hermann Weyl and John von Neumann. Both came to 
this from work on questions in physics. Then two versions of quantum mechanics 
were created: the “wave mechanics” of Schrodinger and the “matrix mechanics” of 
Heisenberg. It was necessary to work out that in some sense, they were “one and the 
same.” 

Both mathematicians developed an axiomatic theory of Euclidean spaces and 
vector spaces and showed that quantum-mechanical théories are connected with 
two isomorphic spaces. However, the différence between those théories and what 
we presented in this book lies in the fact that they worked with infinite-dimensional 
spaces. In any case, for finite-dimensional spaces, there appeared an invariant (that 
is, independent of the choice of coordinates) theory that by now has become univer- 
sally accepted. 

The introduction of the axiomatic approach in geometry was discussed in suffi- 
cient detail in Chap. 11, devoted to the hyperbolic geometry of Lobachevsky. Such 
studies began at the end of the nineteenth century, but their definitive influence in 
mathematics dates from the beginning of the twentieth century. The central figure 
here was Hilbert. For example, he contributed to the application of géométrie intu- 
ition to many problems in analysis. 
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